Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Atlas metadata Management Approach

Highlighted

Atlas metadata Management Approach

Explorer

hi All,


I would like to capture end-to-end metadata from the source (Kafka producer) until its final consumption. Assume, the producer submits the feed, then NIFI picks up this feed from Kafka topic, does transformation and saves it into Hive.


To get this end to end lineage, is the following correct:


1) On-Board the new feed by manually defining new types on Atlas for the Source as Kafka_Source_1 (which is the Kafka Producer)

2) Assuming, Atlas has the hook to create Kafka topics automatically as Types in the Atlas. For Eg: Kafka_Topic_1, etc.


3) As the producer keeps sending the messages to Kafka, NIFI will read such messages, perform transformation, extract the technical and business meta-data, and then create the related instances with Business Glossary, Operational Meta-data (Entities) in Atlas by calling REST API (custom implementation on NIFI Data pipeline) including creation of Hive endpoint as Type. ( I wish, NIFI would better manage the metadata internally and then export to Atlas in batches.) Is there an easier way doing this?


4) Thus, from Atlas, I will see the lineage as: Kafka_Source_1 --> Kafka_Topic_1 --> Message_Transformation_112121 --> Hive_Path_1


If the above flow is OK,

5) Assuming, NIFI is calling the REST API to create the instances for each bulk read from Kafka, the duplicate definitions created will be consolidated within Atlas to give one lineage line from this Source to Destination.

6) If the same feed has its schema (or nay of the business, technical, operational meta-data) changed, how this change will be captured by Atlas? It will create a new lineage or a new version of it. So how the dynamic meta-data is managed?


Thanks in advance

CK