Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Data Lineage Graph with hive views

avatar

The data lineage graph generated by Apache Atlas when Hive view are implicated presents some dilemma. In fact, the hive process that contribute to creation of hive view doesn’t bind only with the views declared in the query but to the views that contribute to creation of those views and recursively until reach table that contributes to creation of views.

I want to know if this lineage is a part of the philosophy of apache atlas in presenting data lineage graph containing hive views or it could be a non-tested case and then should be adjusted.

1 ACCEPTED SOLUTION

avatar

@Radhouene EL HADJ EL ARBI

This is by design. Atlas, and most governance tools in general, will trace lineage as far back as possible. With Atlas, not only will it go back to the root table(s), it can even go as far back as the Storm or Sqoop job that ingested the data to the original tables.

The purpose of having lineage this far back is for a user to be able to effectively trace back the origins of data, whether to validate data quality, for compliance, or even just to understand how the data has mutated/evolved to it's current state.

View solution in original post

1 REPLY 1

avatar

@Radhouene EL HADJ EL ARBI

This is by design. Atlas, and most governance tools in general, will trace lineage as far back as possible. With Atlas, not only will it go back to the root table(s), it can even go as far back as the Storm or Sqoop job that ingested the data to the original tables.

The purpose of having lineage this far back is for a user to be able to effectively trace back the origins of data, whether to validate data quality, for compliance, or even just to understand how the data has mutated/evolved to it's current state.