Created 05-22-2017 10:00 AM
Hi,
Am trying to implement data lineage for my spark application. I Have kafka topic, spark streaming read data from kafka and place in data source. when I checked apache atlas it does n't provide any hooks for spark. I guess we have to use rest api for this implementation. can someone point to some documentation or example for this?
Created 05-22-2017 01:38 PM
You are correct, Atlas does not currently provide lineage for Spark. This is something engineering/community is working on.
You can, however, create your own entities and use the REST API to populate them. Here is some documentation and examples:
http://atlas.apache.org/0.7.0-incubating/AtlasTechnicalUserGuide.pdf
Please note that while this documentation also applies to Atlas 0.7-0.8 (in HDP 2.5-2.6), it does use APIs that have been deprecated in that version and will be removed n future ones. Still, it's good to get you started with your implementation.
As always, if you find any responses here useful, don't forget to "accept" an answer.
Created 05-22-2017 01:38 PM
You are correct, Atlas does not currently provide lineage for Spark. This is something engineering/community is working on.
You can, however, create your own entities and use the REST API to populate them. Here is some documentation and examples:
http://atlas.apache.org/0.7.0-incubating/AtlasTechnicalUserGuide.pdf
Please note that while this documentation also applies to Atlas 0.7-0.8 (in HDP 2.5-2.6), it does use APIs that have been deprecated in that version and will be removed n future ones. Still, it's good to get you started with your implementation.
As always, if you find any responses here useful, don't forget to "accept" an answer.
Created 05-23-2017 05:16 AM
Thanks for the answer. So I created metadata for my custom object in using rest api, then once I retrieved my event from spark streaming added as entity using rest api. So atlas will take care about lineage or do I need to add event modifications manually each and everytime?
Created 05-23-2017 02:03 PM
Take a look at the "Create Lineage amongst data sets" section (p. 46) in the document link I shared above. It also has a detailed example.
Created 05-24-2017 05:55 AM
yes. Got it @Eyad Garelnabi. Thanks
Created 08-05-2021 09:17 PM