Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Lineage is not visible for Hive Table in Atlas

avatar
Explorer

Hello Team ,

We are using HDP-2.6.5.0 and Ambari-2.6.2.0.

We have executed import-hive.sh script to import already existing hive tables in Atlas. It got successfully executed. Now we can see all hive databases and tables in Atlas, but we are not able to see data linage of that imported tables.

If we create external table on any hdfs path then we can see lineage in Atlas.

Also, If we create any managed tables in Atlas, we are not able to see lineage of that tables.

Why we are not getting lineage of older tables and new managed tables?

Please suggest. We are stuck now.

Thanks,

Owez

13 REPLIES 13

avatar

Hi @Owez Mujawar - can you attach some screenshots or something to explain what you mean? Do you see anything inside the lineage box at all in Atlas? You definitely won't see the lineage of anything that happened before you ran the import-hive.sh script, but anything that happens to those tables afterwards should be visible.

It sounds to me like maybe the Hive hook isn't running somehow. Can you verify that the ATLAS_HOOK Kafka topic is created, that the permissions are right (as per this document) and that messages are going through (perhaps check Ranger audit logs to see) and check over some of the properties in this article in case you see any differences.

avatar
Explorer

Hi Team,

We are using HDP-2.6.5.0 and Ambari-2.6.2.0, Atlas-0.8.0

Now we are able to see linage of older tables as well, but only for External tables.

As when we are trying to create a new managed tables and when we create a joins/intersecting two tables, we are not able to see their linage. if we create a new tables as external tables whereas we can see the linage of that tables after running import-hive.sh script.

Request you to let me know that can we see linage for external tables only? or we can see linage for managed tables as well?

@team Please let me know what we can do in Atlas and what we cannot do in Atlas.

avatar
Super Collaborator
@Owez Mujawar

Import-hive.sh doesn't capture the lineage for internal tables.

When Hive hook is enabled , and when you execute the queries, you can see the lineage in Atlas.

avatar
Explorer

@Sharmadha Sainath @Ana Gillan

Hi Team,

We are using HDP-2.6.5.0 and Ambari-2.6.2.0, Atlas-0.8.0

Questions:

1) We need to know can we see linage of Managed tables and virtual views linage in Atlas?

2) When we create a new tables and try to see linage of that tables in first attempt we are not able to see linage of that tables after running import-hive.sh script we are able to see the linage of that newly created tables, why this is happening? it should show automatically the linage of newly created tables.

3) Could you please let me know is there any property need to set or any configuration needs to change.

Two snap shot has been attached

1) atlas_external table as we can see linage of that table after running import-hive.sh script.

2) atlas_managed table we cannot see linage of that table even after running script.

95384-atlas-externaltable.png

95385-atlas-managedtable.png

Request you to let me know that can we see linage for external tables only? or we can see linage for managed tables and virtual views as well?

There is no error in application.log, hbase-log, ranger-audit, kafka-log. Permission has been set in ranger.

Please suggest. We are stuck now.

Thanks,

Owez

avatar
Super Collaborator
@Owez Mujawar

Once you enable Hive hook , you should see the table, lineage , all DDL changes of Hive in Atlas. Please refer http://atlas.apache.org/Hook-Hive.html to enable hive hook.

import-hive.sh script is a one time script (no harm running multiple times though), which is used when on-boarding Atlas for the first time to sync metadata of Hive with Atlas. Import-hive.sh script basically contacts Hive metastore and populates Hive entities and sends it to Atlas. The script has no way to know how a CTAS table is created and what is the source of the table , hence lineage cannot be created. But it can know the details of external table.

Hook captures live events and sends the information to Atlas.

avatar
Explorer

@Sharmadha Sainath

Hive Hook is enabled and we can see all tables and linage of External table only.

We want to see lineage of Managed and Virtual view tables which were created before atlas.

Could you please help me to see that lineage of Managed and Virtual view tables.

i have not seen any error in ranger-log , kafka-log , and hive-log.

only in application-log i can see below error.

2018-12-11 00:12:48,690 ERROR - [pool-2-thread-7 - 3b35712a-6c84-4102-a19c-cedf26ea5e1c:] ~ graph rollback due to exception AtlasBaseException:Instance __AtlasUserProfile with unique attribute {name=admin} does not exist (GraphTransactionInterceptor:73)
2018-12-11 00:11:19,247 INFO  - [main-SendThread(hadmgrndcc03-1.lifeway.org:2181):] ~ Opening socket connection to server hadmgrndcc03-1.lifeway.org/172.17.20.29:2181. Will not attempt to authenticate using SASL (unknown error) (ClientCnxn:1019)
2018-12-11 00:11:30,951 INFO  - [main-SendThread(hadmgrndcc03-3.lifeway.org:2181):] ~ Opening socket connection to server hadmgrndcc03-3.lifeway.org/172.17.20.33:2181. Will not attempt to authenticate using SASL (unknown error) (ClientCnxn:1019)
2018-12-11 00:12:48,690 ERROR - [pool-2-thread-7 - 3b35712a-6c84-4102-a19c-cedf26ea5e1c:] ~ graph rollback due to exception AtlasBaseException:Instance __AtlasUserProfile with unique attribute {name=admin} does not exist (GraphTransactionInterceptor:73)

Let me know how we can see lineage of Managed and virtual view tables.

avatar
Super Collaborator

@Owez Mujawar

It is not possible to see lineage of tables created before Atlas Hive hook is enabled.

Once hive hook is enabled , and then when new tables,lineage are created , you will be able to see the lineage.

Let's consider the scenario:

1. Atlas is not installed on the cluster and no hive hook is enabled . Tables , views , (table1,view1)) etc are created in Hive.

2. Install Atlas on the cluster . Run import-hive.sh to let the script contact Hive metastore and populate Atlas with existing Hive data. This script creates all tables , but lineage is not created (except for the external table).

3. Now , Enable hive hook .Create tables and views now . Since Hive hook is enabled , the lineage should be created in Atlas without running any import script. Since hook is enabled , we don't need to run the import-hive.sh script anymore.

avatar
Explorer

@Sharmadha Sainath

Thanks for the confirmation.

The scenario you explain that's perfect suits in our environment, but 3rd Scenario is not working in our environment.

for example we are using below query to create Managed,View and External table:

CREATE TABLE IF NOT EXISTS employee13 ( eid int, name int) ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/hive/data/employee13';
CREATE VIEW employee_data AS SELECT * FROM employee WHERE eid >0;
CREATE EXTERNAL TABLE employee_data2 (  eid int, name int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' LOCATION '/hive/data/employee';

When we execute above query its gets successfully executed. we don't see any errors in logs.

When we login to Atlas UI we are not able to see that newly created tables in 1st Attempt.

After running import-hive.sh script we are able to see the lineage of External tables only. We are not able see lineage of newly created managed and view tables.it comes as no linage found.

Could you please guide me on that.

avatar
Super Collaborator

@Owez Mujawar

Looks like Hive Hook messages are not read by Atlas.

Can you check if Hive hook is enabled correctly and can you check the following permissions ?

hive's permission to publish to ATLAS_HOOK topic

atlas's permission to read from ATLAS_HOOK topic and publish to ATLAS_ENTITIES topic.