Posts

What data challenge will you tackle this new year? One year working with partners at Snowflake

With the new year comes the retrospective. This is also the opportunity to give some visibility on what partner sales engineer role means, we have currently 2 positions open in Europe ( UK and Germany ) and a lot more to come ! Don't hesitate to contact me to know more. 2021, the year of Data Mesh  Data Mesh has been a trending architecture topic for several months. It's now in most of the discussions we have with large accounts (with good data maturity). Of course it resonates very well with Snowflake Data Cloud and the ability to seamlessly share data  between Snowflake accounts ("Domains") wherever they are but also with all elastic and scalable resources that can be allocated to these domains. Domain centric architecture Actually, a year ago and before I knew about Data Mesh architecture principles, I wrote a blog  " Domain centric architecture : Data driven business process powered by Snowflake Data Sharing ".  What a mistake! I should have call it ...

Process XML, JSON and other sources with XQuery at scale in the Snowflake Data Cloud

Image
In the past few years, I worked with XML and Json technologies in Database. There are still tons of applications especially in the Manufacturing or Finance where XML is leveraged as a data exchange standard. In this new short blog post, the objective is to create a function in Snowflake able to take data from XML, JSON (coming from a VARIANT type) or even from a standard table, view or Snowflake Share and apply an arbitrary XQuery to generate an output in XML (or any other output format thanks to XQuery 3). This function is of course running in the Snowflake compute layer and take advantage of all Snowflake capabilities. The dependencies First we have to load the JAVA dependencies into the Snowflake stage. For this illustration, we again use the user stage. put   file : /// Users / xxxxx / Downloads / saxon-he-test-10.5.jar  @~/ javalib /        auto_compress  =   false     overwrite = true ;            ...

Let's create dataset for demonstrations in few lines of SQL and Snowflake Java UDF

Image
In the presales activities, a recurring topic is how I could get a dataset to illustrate a scenario. With Snowflake, a first answer is of course "let's see what I can find on the Marketplace". However it's also useful to have fake customer records, fake transaction records and other datasets that can be used to illustrate a retail, insurance and other industry scenario. For that you can leverage fake data generator, generate datasets and then import them into Snowflake. With Snowflake Java UDF, it's now possible to have a generator embedded into Snowflake to generate any volume of data at any time with few lines of SQL. Here are the 3 simple steps to get your brand new fake datasets ready to go: Loading JAR into Snowflake stage First we will load the faker library into Snowflake. For today, we don't want to build a new library based on Faker but just load the library as is and leverage it into Snowflake. You can get the JAR from MAVEN. So we load Faker (the ge...

Data Sharing is sometimes not enough: Regulated Data Sharing with the Data Clean Room in Snowflake

Image
We mentioned in previous blog post that Snowflake Data Sharing is a game changer to make the extended enterprise happen. Data Sharing in Snowflake allows the share seamlessly datasets and data functions between parties without moving the data. It provides a full reader access to the dataset shared by the producer which is incredibly powerful to combine on the fly data from internal and from third parties sources. When "open" sharing is not enough However sometimes, due to regulatory constraints or just competitive motivations, it's not possible to share the full dataset.  It's especially true when the data contains personal information and were legislations such as GDPR or CCAP restrict what is possible without explicit consent from the individuals.  It's also true if for competitive reason (Data is the new gold), the provider doesn't want the consumer to access the full row level dataset with each individual records but just want to provide the ability to per...

Domain centric architecture : Data driven business process powered by Snowflake Data Sharing

Image
A few month ago I was discussing with the Data Management director of a large manufacturer. We were talking about the architecture principles of having a domain centric data service layer which would feed business processes with harmonised, contextualised and validated data.  In each domain, the data would be owned by domain experts who have extensive knowledge of the data requirements, sources characteristics and usage. Each domain then serves the data "as a service" for it be be consumed and joined with other domains by business processes. This vision provides the ability to quickly drive new use cases and value by leveraging preexisting domains (and related effort) without reinventing the overall data stack in the new "product". However the realisation at scale faces multiple challenges and barriers: currently the data is usually moved multiple times to be prepared and then moved sometimes to each individual product that requires this data, it creates delays, gov...

Snowflake Data sharing is a game changer : Be ready to connect the dots (with a click)

Image
In a previous blog I talked about extended enterprise and how Snowflake can provide the data architecture to support it . Today I will try to illustrate it with a more realistic scenario. The scenario comes from my previous experience in manufacturing but can be adapted to any industry : Several parties need to share their data to get the full picture of a product and then estimate the production costs. This could be part of a Design-to-Cost business process  aiming at optimising product design and fabrication in order to  Use as many as possible off-the-shelf components Reduce raw material usage Reduce the product complexity Mutualise components Reduce waste Eliminate non desired features In order to make it easy to understand I will simplify the data processing part of the scenario and keep the most valuable dimension, the sharing. A composite product Let's imagine we are dealing with a company producing a product and working with several third parties and plants (subsidiari...