Data Sharing is sometimes not enough: Regulated Data Sharing with the Data Clean Room in Snowflake
We mentioned in previous blog post that Snowflake Data Sharing is a game changer to make the extended enterprise happen. Data Sharing in Snowflake allows the share seamlessly datasets and data functions between parties without moving the data. It provides a full reader access to the dataset shared by the producer which is incredibly powerful to combine on the fly data from internal and from third parties sources.
When "open" sharing is not enough
However sometimes, due to regulatory constraints or just competitive motivations, it's not possible to share the full dataset.
- It's especially true when the data contains personal information and were legislations such as GDPR or CCAP restrict what is possible without explicit consent from the individuals.
- It's also true if for competitive reason (Data is the new gold), the provider doesn't want the consumer to access the full row level dataset with each individual records but just want to provide the ability to perform aggregates between its data and the consumer data.
In such circumstances, the Data Clean Room is a solution providing controlled access to the share and where the consumer can usually only perform aggregation queries on the joint datasets to make sure it will never be able to retrieve all enrichments from the provider at a record level. The Data Clean Room has an extensive usage in ads & Media operations especially since the introduction of the GDPR and now with the upcoming end of the third party cookies.
Data Clean Room illustration
A simplified representation would be the following where 2 parties have a knowledge on a person (or an asset) and want to mutualise this knowledge to generate new insights (segmentation for example).
In the example the consumer who queries the data will not be able to retrieve individual record enrichment but he will be able to retrieve the aggregates (final step below) excluding aggregates which are based on too few individual records to avoid rematch with the original records.
A global data sharing capability
But as we mentioned before the usage of Data Clean Room is not restricted to Ads & Media. Moreover, Data Clean Room is the answer for regulated Data Sharing but what about all the other Data Sharing opportunities where the Shares are fully "open" without restriction. It's indeed highly relevant to generate insights with a "360 view" of a person/Asset based on internal sources (CRM, transactional system, etc.) and open third party data and then eventually leverage this 360 view to match it with "regulated" shares from a third party in order to produce aggregates and additional insights.
Here comes Snowflake with a global Data Sharing proposition:
With Snowflake you can first perform "standard" data sharing between multiple parties but you can also implement in the same solution a Data Clean Room leveraging Data Sharing capabilities but applying anonymization and "hidden" aggregation on the fly. The Snowflake Data Clean Room can ensure full data privacy but allows ad'hoc aggregation queries by the consumer.