What data challenge will you tackle this new year? One year of Solution design at MarkLogic

As we are starting a new year, let's have a retrospective of some of the main solutions powered by MarkLogic of my 2017 year.





We are going to give an overview of:
  • "Big-ECM" Solution - Insurance
  • Product Testing Datahub - Manufacturing and others...
  • Customer 360 - All industries
  • Semantic integration layer for PLM - Manufacturing
  • virtual MarkLogic World - Insurance


Big ECM or how to make the most of documents

The challenge

Insurers have been dealing for decades with hundred of millions, sometimes billions of documents usually stored in multiples siloed ECM systems due to multiple M&A. 
Documents are most of the time considered only from their attached metadata which are limited to basic client references.
But documents received from clients are a key dimension of the customer 360. Combined with other customer knowledge it can produce relavant insights to better serve and retain clients :
  • Semantic analysis :
    • Sentiment analysis
    • Entity extraction from legal documents
  • Full-text search combined with structured profile data (CRM, policies,etc.)

The solution

What they need is a solution to leverage securely and at scale whatever information can be extracted from the documents (Metatada) and combine it with the customer knowledge they have coming from multiple different silos.

This is customer360 and where MarkLogic excels : a transactional operational data hub which can store heterogenous data, semantics, search and query, high volumes and with high security standards.

Moreover, MarkLogic can dynamicaly adapt the document classification based on business user context. This is perform by leveraging semantics and inference at query time.


MarkLogic capabilities at the core of solution :

  • Transactional database
  • Schema Agnostic
  • Semantics
  • Full-text search in multiple languages
  • Encryption at Rest


Product Testing DataHub

The challenge

Product testing specialists collect massive amount of data to answer test case scenarios. These data are analyzed to produce consolidated test reports but usually raw data can then hardly be reused.


However, the answers to complex business questions are probably already there, hidden in the big (raw) data in a data lake and in multiple silos.

Let's consider IoT measures coming from multiple sensors and timestamped, report documents from experts (unstructured content), product configuration and tests conditions (structured contents).


Now let's consider you need to find specific periods of time in tests where temporal patterns happen : it can be how an aircraft reacts on a sequence of successive events or how a patient state is evolving based on measures and specialist reports.
In order to analyse such patterns you not only need to analyse time series but also unstructured documents (reports, client feedbacks, social media interactions), structured data (test conditions, external factors) and sometimes geospatial data. As the analysis requires experts supervision, the all solution must provide interactions with end-users with response in seconds and not hours or days.


The solution

Here we leverage all MarkLogic multimodal capabilities, from document storage, triple store, query/search and complex indexes (including geospatial).
The solution provides fast exploration tools to help business users in identifying where to look at (a particular test, a particular moment in time). Then the expert can deep dive into the details by leveraging raw data.

Where is it applicable ?

  • Aerospace and Manufacturing product testing
  • Clinical trial : how patient respond to treatment based on measures and specialist reports
  • Intelligence : behavior pattern search
  • and more...

MarkLogic capabilities at the core of solution :

  • Multimodel, Multimodel, Multimodel
  • Semantics / Ontologies
  • Search & Query
  • Universal Index / Specialised indexes
  • Geospatial
 I'll come back to this solution in another post.


Customer 360 meets semantics

The challenge

Customer 360 and golden record is a typical use case for MarkLogic datahub : Ingest heterogenous content coming form multiple silos and then perform matching using rules to deduplicate and generate the golden record. The golden record must then be consumed by operational applications to perform recommendations, segmentation, self-care including record enrichment or also analytics.
The main challenge is keep agility while the number of sources is increasing and the rules of matching change. 


The solution


MarkLogic provides "easy in" capabilities by loading the sources As-is thanks to its schema agnostic storage. The source records can me loaded with no effort and nothing is left behind. MarkLogic can keep track of every object property and all metadata which is especially required to track user consent to match GDPR requirements.
As soon as the data are loaded, MarkLogic can lift into the semantic graph whatever field is required to perform the matching. Actually using MarkLogic Template Driven Extraction, the triple store is always up to date at record transaction time. In the illustration below, inference is used to apply matching rules at query time which allows to update the rule and apply it immediately.

A video is better than sentences so here it is:


MarkLogic capabilities at the core of solution:

  • Schema Agnostic
  • Semantics
  • Inference
  • Phonetic search ( double- metaphone)
  • Text distance / fuzzy match


Semantics as an universal layer for PLM systems

The challenge

Large manufacturers have to manage multiple product programs in parallel. PLM systems have a long lifecycle which can be more than 20 years. In order to leverage the assets of the legacy PLMs, reuses components and accelerate time to market, manufacturers need to have them communicating using a shared knowledge representation.
Here comes the semantics.

The solution

Using MarkLogic, it's possible to load from heterogeneous sources parts, assembly and product structures. Using Template Driven Extraction (extraction of triples from the data in documents based on templates), it’s then possible to expose the data through a shared semantic lens.
Moreover MarkLogic can also manage complex effectivity rules by leveraging the reverse query capability. The effectivity rules in the product structure can be converted into queries stored alongside with the data and links themselves. This produce a highly elegant but also highly scalable way to apply effectivity in real time on large product structures.
I’ll have the opportunity to come back to this solution in the coming weeks.



Some Fun in February : virtual MarkLogic World

MarkLogic World is, every year, the best place to learn about MarkLogic and meet our clients and teams.
Virtual MarkLogic World is a good place to make use cases real.




February is the time to kick off the year with a demo Jam.

For this we worked with some colleagues on an insurance use case demo based on a virtual environment. 




The customer 360 is used combined with geospatial context to deliver realtime alerts (eg. based on contract terms) or personalized offers.



MarkLogic capabilities at the core of solution:

  • Reverse queries / Alerting
  • Geospatial
  • REST Extensions


A topic for an upcoming publication: Unity3D for presales demo


These solutions were my main topics for 2017, 2018 is already there and manufacturing is still one of my focus at the moment.




I wish you all a happy new year
full of data challenges !!

If you have data challenges for 2018,
let's have a talk !






My own 2017 toolbox as a Solutions Architect

  • MarkLogic, MarkLogic, MarkLogic...
  • MarkLogic so talented colleagues
  • Passion
  • Creativity
  • 10+ years of consulting...
  • Unity3D : Game engine but also wonderful environment to tell real life stories
  • Aurelia.io : so clean front-end framework ! (I'm not a developer so let's keep things simple)
  • visjs : Yes MarkLogic is also a triple store so Graph visualisation is great
  • echarts : what a great lib for clean and beautiful dataviz
  • PowerPoint and @Keynote of course...

For 2018, take the same plus:


Popular posts from this blog

Domain centric architecture : Data driven business process powered by Snowflake Data Sharing

Snowflake Data sharing is a game changer : Be ready to connect the dots (with a click)

Process XML, JSON and other sources with XQuery at scale in the Snowflake Data Cloud