Empowering users with low code DataHub

Context

Visual programming is around for a while. It has been used intensively in the multimedia tools, ETL tools and more recently in game engines.

And what about Scratch,  the famous environment designed to teach programming to the kids.
You'll find some of these references on https://en.wikipedia.org/wiki/Visual_programming_language.


I’m an Unity3D user for a while and latest versions of Unity make extensive usage of graph to design complex processing and their execution.
Résultat de recherche d'images pour "unity3d visual effect grzaph"So, I tried to replicate what Unity3D and the previous mentioned tools are providing and apply to MarkLogic DataHub Framework. One important information is that MarkLogic embeds V8 JavaScript Engine. It means that it's possible not only to design execution flows using visual programming but also to execute it in the database, as close as possible to the data.

Motivations

The initial motivation comes from business contexts where business analysts are working with data. Analysts are highly knowledgeable with data meaning, quality, rules and can even develop Excel macros (pseudo code) to manipulate the data. Excel is however part of the shadow IT, so it's important to move back all this data silos and associated logic to some shared and accessible environments. But there is a gap, Excel is in the toolbox of any business analyst but programming languages of the IT solutions are usually not.

The main motivation here of the visual programming is to empower these users to avoid the maintenance backlog frustration syndrome (add to backlog and wait for a release to come back sometime in the future). At the end it should provide a solution to design and run any logic with a low code approach. Of course, code will probably not be optimised as it could be but anyway by experience even with bespoke development it's sometimes not.

The building blocks

Never reinvent the wheel !


Ok so, if you look into GitHub, you can find several projects that provide node based programming frameworks. One of them is litegraph : a quite simple, good looking and even better providing UI and engine to design and execute visual graph in JavaScript.

With minimal changes, actually 1 line of code (One timer based scheduler designed for nodejs and not V8), the engine can run in MarkLogic.

Résultat de recherche d'images pour "quasar.dev pictures vuejs"Then, in order to bundle the graph designer area into a web app, an all-in-one framework is great to quickly create a front-end. Quasar (based on vuejs), is currently the best framework I found for the PoC and concepts I'm working on.

Building the tool

With these building blocks, it's quite easy to implement the UI below (considering I'm not a developer, it must be easy).


This UI is connected to MarkLogic in order to introspect any source available in the staging database (left side panel), it can then suggest the fields to be exposed as plugs by a block. The UI does the same with Entities define in the DataHub (right side panel)
The Litegraph framework allows to register blocks on the fly, so to make it useful, the UI based on settings creates block to manipulate the data. 

Flow creation with reusable building blocks 

Graph can describe all the actions to be executed in the flow based on reusable building blocks. So the flow can describe 1-1 mapping from source to entity, value conversion but also any logic that takes values as inputs and delivers outputs (what about a Excel vlookup like function for example).
Example of Building blocks:
  • Value mapper
  • Date/format converters
  • Business entities based on ontology (Derived from Entity definition)
  • Merging module with rules
  • Source entity based on data introspection
  • Etc.

Low level MarkLogic API development

By creating proxy to all MarkLogic APIs, it's even possible to expose all API functions as pluggable boxes. The example below was created by generating the proxies from the API documentation.

And lot more...

Considering a block can implement any logic and can have its own settings, it's also possible to provide high level features such as:

Value mapping




Map value from input to output based on a dictionary which is configured in the UI.









Customer transactions:
The block takes the customer Id as an input and returns the transaction statistics configured in the block. In the configuration example it would be the sum of the transactions performed during the last 365 days.




PROV-O:
For many MarkLogic users, prov-o is a very interesting capability. So the graph can generate provenance data to be stored alongside with the data.


Etc.

Combined with MarkLogic DHF

So now that we have a designer that can draw the logic. We can allow execution as part of a DHF flow. DHF 5 allows the creation of custom step that runs any code. In order to streamline the integration of the executable graph with DHF, I created an export function in the Designer in order to produce ready-to-run DHF custom step code, embedding the graph definition (as serialised json) and the execute call provided by the Litegraph framework. 
For people familiar with MarkLogic, it means the exported code just has to be copied/pasted to the gradle DHF project, then deploy the module and the flow is ready to run.


What's next ?

Here is the recording:


If you are interested, don't hesitate to contact me, overall code is open source.
https://github.com/marklogic-community/pipes



Popular posts from this blog

Domain centric architecture : Data driven business process powered by Snowflake Data Sharing

Snowflake Data sharing is a game changer : Be ready to connect the dots (with a click)

Process XML, JSON and other sources with XQuery at scale in the Snowflake Data Cloud