Created 09-15-2021 09:44 AM
Hi Support team,
I kindly need your support.
I am doing some tests in order to evaluate into Cloudera the Atlas data lineage by importing Hive DB.
I executed several tests and also followed entirely this sample: https://community.cloudera.com/t5/Community-Articles/Using-Apache-Atlas-to-view-Data-Lineage/ta-p/24...
In order to migrate all the entities into Atlas, I successfully executed the import-hive.sh script that is located in Atlas hook-bin folder.
The result of the migration shows (into Atlas) Hive test DB with all its entities but I cannot see in Atlas any lineage (that should appear as the join between two tables in order to create the third one), simply the message: " No lineage data found ".
I also checked that if I create EXTERNAL TABLES rather than TABLE, something appears in Lineage section but is incomplete and not showing exactly the dependencies with source table names.
Could you please guide me in the solution of this issue that is hiding the Data Lineage?
I checked also logs here located /var/logs/atlas/application.log and all seem to be fine except a strange error " ERROR - [etp1881561036-238:] ~ Exception while fetching groups (AtlasAbstractAuthenticationProvider:137)java.io.IOException: No groups found for user HTTP "
that seem to be not linked to this issue. I followed the hints of this thread "https://cloudera.ericlin.me/2018/08/webhcat-request-failed-with-error-id-http-no-such-user/" to fix this other error message but it's still appearing.
Thanks a lot in advance for your support.
Best Regards
Created 09-16-2021 06:57 AM
Hi @dansteu,
First of all, executing the import-hive.sh command will only import Hive entities (DBs, tables, columns) into Atlas. It won't create the inter-table lineage.
The only way to see lineage is for Atlas to automatically fetch those entities from Hive using the Atlas hook. In order to see lineage, you need the following components in a healthy state: Hive, Hbase, Kafka, Solr, Atlas.
The article you referred to is correct and note that it doesn't ask you to run import-hive.sh at all! Hope this helps.
Regards,
Saurabh
Created 09-16-2021 06:57 AM
Hi @dansteu,
First of all, executing the import-hive.sh command will only import Hive entities (DBs, tables, columns) into Atlas. It won't create the inter-table lineage.
The only way to see lineage is for Atlas to automatically fetch those entities from Hive using the Atlas hook. In order to see lineage, you need the following components in a healthy state: Hive, Hbase, Kafka, Solr, Atlas.
The article you referred to is correct and note that it doesn't ask you to run import-hive.sh at all! Hope this helps.
Regards,
Saurabh
Created 09-17-2021 12:34 AM
Hi @slambe ,
Thanks a lot for your assistance.
So as far as I understood, only the automatic data feeding towards Atlas will integrate Data Lineage information.
Could you please link me any official Cloudera documentation in order to set up this Hive Hook for Atlas? I mean hive-site.xml settings and any other configuration we need to care also in Kafka for sample.
Thanks a lot for your support.
Best Regards,
Daniele.
Created 09-17-2021 08:09 AM
@dansteuyou are correct. Only if Atlas fetches Hive metadata automatically (with no manual intervention), only then the lineage is created.
Please refer to this Apache Atlas document to set up the Atlas Hive hook:
https://atlas.apache.org/1.2.0/Hook-Hive.html
BTW, when you install Atlas in an HDP cluster, these properties are automatically set and no intervention is needed. So you might just need to confirm if those properties are set.
Regards,
Saurabh
Created 09-20-2021 03:37 AM
Hi @slambe
Thanks for linking me the documentation.
I followed that one in detail and now tables are automatically triggered from Hive into Atlas without executing any manual script. Thanks!
Unfortunately we still face "No lineage data found".
For sample I tried to execute this simple use case https://community.cloudera.com/t5/Community-Articles/Using-Apache-Atlas-to-view-Data-Lineage/ta-p/24... but branch_intersect table is still not showing any Data Lineage.
Is there any Kafka topic to be configured for Data Lineage? In the guide followed those are not mentioned but I can retrieve something about that in other Cloudera forum whenever I search for "No lineage data found".
Thanks,
Daniele.
Created 09-23-2021 05:17 AM
@dansteuCan you share the output of these commands? Please run them from the Kafka broker host and request the Kafka principal prior:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper `hostname -f`:2181
And the second command:
https://gitlab.com/saurabhlambe/Atlas-stuff/-/blob/master/Atlas%20API.md#c-describe-atlas-consumer-g...
Created 09-24-2021 05:42 AM
Hi @slambe
I solved the issue and I am now able to see Data Lineage correctly.
Thanks a lot for the support.
Best Regards,
Daniele.
Created 09-24-2021 05:46 AM
Glad that the issue has been resolved @dansteu. Can you please accept my answer as a solution:
This will help the community find answers in future. Thanks!
Created 10-27-2021 08:43 AM
Hi @slambe ,
still related to this thread, I would like to know how it's possible to see data lineage of an existing Hive DB. Better to explain me; if I execute the data lineage executing the scripts of the article https://community.cloudera.com/t5/Community-Articles/Using-Apache-Atlas-to-view-Data-Lineage/ta-p/24... it works perfectly now. But imagine to suggest this solution to an existing prod environment with tons of tables, relations - the scripts of creation are hidden to me but I would guess Atlas can read table fields and dependencies; how could the data lineage be available in Atlas in this case? the logic between tables is hidden to Atlas cause the creation of structures has been defined during Hive DB creation, years ago, and it's not a live process.
Thanks,
Best Regards,
Daniele.