07-10-2019 05:45 AM - last edited on 07-10-2019 07:45 AM by cjervis
I would like to capture end-to-end metadata from the source (Kafka producer) until its final consumption. Assume, the producer submits the feed, then NIFI picks up this feed from Kafka topic, does transformation and saves it into Hive.
To get this end to end lineage, is the following correct:
1) On-Board the new feed by manually defining new types on Navigator for the Source as Kafka_Source_1 (which is the Kafka Producer)
2) Assuming, Navigator has the hook to create Kafka topics automatically as Types in the Navigator. For Eg: Kafka_Topic_1, etc.
3) As the producer keeps sending the messages to Kafka, NIFI will read such messages, perform transformation, extract the technical and business meta-data, and then create the related instances with Business Glossary, Operational Meta-data (Entities) in Navigator by calling REST API (custom implementation on NIFI Data pipeline) including creation of Hive endpoint as Type. ( I wish, NIFI would better manage the metadata internally and then export to Atlas in batches.) Is there an easier way doing this?
4) Thus, from Navigator, I will see the lineage as: Kafka_Source_1 --> Kafka_Topic_1 --> Message_Transformation_112121 --> Hive_Path_1
If the above flow is OK,
5) Assuming, NIFI is calling the REST API to create the instances for each bulk read from Kafka, the duplicate definitions created will be consolidated within Navigator to give one lineage line from this Source to Destination.
6) If the same feed has its schema (or nay of the business, technical, operational meta-data) changed, how this change will be captured by Navigator? It will create a new lineage or a new version of it. So how the dynamic meta-data is managed?
Thanks in advance