Created 04-17-2017 12:37 PM
I have 2 datasets that are copied to HDFS from Local and they were joined and transformed using Spark SQL and stored as a single dataset in HDFS. I was able to capture the meta data information and push it to Atlas by going through the Atlas REST API as it provide POST methods for pushing the JSON file into Atlas. whereas for Data Lineage i could only see the GET method. How to create data lineage in this scenario?
Created 04-18-2017 10:48 PM
Lineage is generate with type definitions called Process and DataSet, usually when you create these with sufficient information depicting the "Process" of copying "DataSet" from HDFS to Local and similarly for what's happening in the Spark realm, Atlas should be able to generate the Lineage info for you.
All you need it to create the Process and Dataset entities for the above scenario. HTH
Created 04-18-2017 10:48 PM
Lineage is generate with type definitions called Process and DataSet, usually when you create these with sufficient information depicting the "Process" of copying "DataSet" from HDFS to Local and similarly for what's happening in the Spark realm, Atlas should be able to generate the Lineage info for you.
All you need it to create the Process and Dataset entities for the above scenario. HTH
Created 06-23-2017 12:27 PM
Thanks for your response. I found the technical details of the approach you mentioned in this Link.