Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Atlas Spark Data lineage

avatar
Rising Star

Hi,

Am trying to implement data lineage for my spark application. I Have kafka topic, spark streaming read data from kafka and place in data source. when I checked apache atlas it does n't provide any hooks for spark. I guess we have to use rest api for this implementation. can someone point to some documentation or example for this?

1 ACCEPTED SOLUTION

avatar

@vnandigam

You are correct, Atlas does not currently provide lineage for Spark. This is something engineering/community is working on.

You can, however, create your own entities and use the REST API to populate them. Here is some documentation and examples:

http://atlas.apache.org/0.7.0-incubating/AtlasTechnicalUserGuide.pdf

Please note that while this documentation also applies to Atlas 0.7-0.8 (in HDP 2.5-2.6), it does use APIs that have been deprecated in that version and will be removed n future ones. Still, it's good to get you started with your implementation.

As always, if you find any responses here useful, don't forget to "accept" an answer.

View solution in original post

5 REPLIES 5

avatar

@vnandigam

You are correct, Atlas does not currently provide lineage for Spark. This is something engineering/community is working on.

You can, however, create your own entities and use the REST API to populate them. Here is some documentation and examples:

http://atlas.apache.org/0.7.0-incubating/AtlasTechnicalUserGuide.pdf

Please note that while this documentation also applies to Atlas 0.7-0.8 (in HDP 2.5-2.6), it does use APIs that have been deprecated in that version and will be removed n future ones. Still, it's good to get you started with your implementation.

As always, if you find any responses here useful, don't forget to "accept" an answer.

avatar
Rising Star

@Eyad Garelnabi

Thanks for the answer. So I created metadata for my custom object in using rest api, then once I retrieved my event from spark streaming added as entity using rest api. So atlas will take care about lineage or do I need to add event modifications manually each and everytime?

avatar

Take a look at the "Create Lineage amongst data sets" section (p. 46) in the document link I shared above. It also has a detailed example.

avatar
Rising Star

yes. Got it @Eyad Garelnabi. Thanks

avatar
Super Collaborator