Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to create data lineage in Atlas for data that is copied from local to HDFS and transformed/processed by Spark SQL

Solved Go to solution

How to create data lineage in Atlas for data that is copied from local to HDFS and transformed/processed by Spark SQL

New Contributor

I have 2 datasets that are copied to HDFS from Local and they were joined and transformed using Spark SQL and stored as a single dataset in HDFS. I was able to capture the meta data information and push it to Atlas by going through the Atlas REST API as it provide POST methods for pushing the JSON file into Atlas. whereas for Data Lineage i could only see the GET method. How to create data lineage in this scenario?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to create data lineage in Atlas for data that is copied from local to HDFS and transformed/processed by Spark SQL

Rising Star

Lineage is generate with type definitions called Process and DataSet, usually when you create these with sufficient information depicting the "Process" of copying "DataSet" from HDFS to Local and similarly for what's happening in the Spark realm, Atlas should be able to generate the Lineage info for you.

All you need it to create the Process and Dataset entities for the above scenario. HTH

2 REPLIES 2
Highlighted

Re: How to create data lineage in Atlas for data that is copied from local to HDFS and transformed/processed by Spark SQL

Rising Star

Lineage is generate with type definitions called Process and DataSet, usually when you create these with sufficient information depicting the "Process" of copying "DataSet" from HDFS to Local and similarly for what's happening in the Spark realm, Atlas should be able to generate the Lineage info for you.

All you need it to create the Process and Dataset entities for the above scenario. HTH

Re: How to create data lineage in Atlas for data that is copied from local to HDFS and transformed/processed by Spark SQL

New Contributor

Thanks for your response. I found the technical details of the approach you mentioned in this Link.