Created 11-29-2016 02:34 PM
The data lineage graph generated by Apache Atlas when Hive view are implicated presents some dilemma. In fact, the hive process that contribute to creation of hive view doesn’t bind only with the views declared in the query but to the views that contribute to creation of those views and recursively until reach table that contributes to creation of views.
I want to know if this lineage is a part of the philosophy of apache atlas in presenting data lineage graph containing hive views or it could be a non-tested case and then should be adjusted.
Created 12-16-2016 07:14 AM
This is by design. Atlas, and most governance tools in general, will trace lineage as far back as possible. With Atlas, not only will it go back to the root table(s), it can even go as far back as the Storm or Sqoop job that ingested the data to the original tables.
The purpose of having lineage this far back is for a user to be able to effectively trace back the origins of data, whether to validate data quality, for compliance, or even just to understand how the data has mutated/evolved to it's current state.
Created 12-16-2016 07:14 AM
This is by design. Atlas, and most governance tools in general, will trace lineage as far back as possible. With Atlas, not only will it go back to the root table(s), it can even go as far back as the Storm or Sqoop job that ingested the data to the original tables.
The purpose of having lineage this far back is for a user to be able to effectively trace back the origins of data, whether to validate data quality, for compliance, or even just to understand how the data has mutated/evolved to it's current state.