I would like to capture end-to-end metadata from the source (Kafka producer) until its final consumption. Assume, the producer submits the feed, then NIFI picks up this feed from Kafka topic, does transformation and saves it into Hive.
To get this end to end lineage, is the following correct:
1) On-Board the new feed by manually defining new types on Atlas for the Source as Kafka_Source_1 (which is the Kafka Producer)
2) Assuming, Atlas has the hook to create Kafka topics automatically as Types in the Atlas. For Eg: Kafka_Topic_1, etc.
3) As the producer keeps sending the messages to Kafka, NIFI will read such messages, perform transformation, extract the technical and business meta-data, and then create the related instances with Business Glossary, Operational Meta-data (Entities) in Atlas by calling REST API (custom implementation on NIFI Data pipeline) including creation of Hive endpoint as Type. ( I wish, NIFI would better manage the metadata internally and then export to Atlas in batches.) Is there an easier way doing this?
4) Thus, from Atlas, I will see the lineage as: Kafka_Source_1 --> Kafka_Topic_1 --> Message_Transformation_112121 --> Hive_Path_1
If the above flow is OK,
5) Assuming, NIFI is calling the REST API to create the instances for each bulk read from Kafka, the duplicate definitions created will be consolidated within Atlas to give one lineage line from this Source to Destination.
6) If the same feed has its schema (or nay of the business, technical, operational meta-data) changed, how this change will be captured by Atlas? It will create a new lineage or a new version of it. So how the dynamic meta-data is managed?