Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive metadata does not not show up in Atlas with hook

avatar
Contributor

I have Hive and Atlas installed on HDP 2.6.5 with Hive hook for Atlas enabled and no changes to the configuration. I am able to successfully import Hive metadata with import-hive.sh, but the Hive hook does not seem to work. When I create a database in Hive, it does not show up in Atlas.

The only things I see in the logs are

atlas/application.log:

ERROR - [pool-2-thread-5 - de30e17a-1db7-4aad-8f34-a61a27b33cff:] ~ graph rollback due to exception AtlasBaseException:Instance __AtlasUserProfile with unique attribute {name=admin} does not exist (GraphTransactionInterceptor:73)

hive logs:

./hiveserver2.log.2018-09-10:2018-09-10 14:55:42,587 INFO [HiveServer2-Background-Pool: Thread-61]: hook.AtlasHook (AtlasHook.java:<clinit>(99)) - Created Atlas Hook

./hiveserver2.log:2018-09-11 09:04:20,100 INFO [HiveServer2-Background-Pool: Thread-3201]: log.PerfLogger (PerfLogger.java:PerfLogBegin(149)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver>

./hiveserver2.log:2018-09-11 09:04:20,100 INFO [HiveServer2-Background-Pool: Thread-3201]: log.PerfLogger (PerfLogger.java:PerfLogBegin(149)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver>

When I search for the database name in /kafka-logs/, it only shows up in ./ATLAS_HOOK-0/00000000000000000014.log, and the entry looks like

{"msgSourceIP":"172.18.181.235","msgCreatedBy":"hive","msgCreationTime":1536681860109,"message":{"entities":{"referredEntities":{},"entities":[{"typeName":"hive_db","attributes":{"owner":"hive","ownerType":"USER","qualifiedName":"oyster9@bigcentos","clusterName":"bigcentos","name":"oyster9","location":"hdfs://host.com:8020/apps/hive/warehouse/oyster9.db","parameters":{}},"guid":"-82688521591125","version":0}]},"type":"ENTITY_CREATE_V2","user":"hive"},"version":{"version":"1.0.0"},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1}

Which tells me that the message about the new DB gets passed to the Kafka stream, but is not read by Atlas.

I do not know where to look next, and my goal is to make it so that when a database is created in Hive, it shows up in Atlas automatically.

1 ACCEPTED SOLUTION

avatar
Contributor

Changing offsets.topic.replication.factor in Kafka config to 1 (number if brokers) addressed the issue.

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@Maxim Neaga

Can you please check if you have the Hive Clients installed on Atlas Node?

Also can you please let us know if you have setup Ranger ? If yes, then have added proper policies/permissions?

avatar
Contributor

@Jay Kumar SenSharma

Yes, I have Hive clients installed on all nodes.

I do have Ranger installed, but all plugins are disabled, so I do not think it affects it in any way.

I do see the messages about the new database posted into Kafka ATLAS_HOOK stream with

./kafka-console-consumer.sh --zookeeper localhost:2181 --topic ATLAS_HOOK --from-beginning

but for some reason, it does not work with --bootstrap-server:

./kafka-console-consumer.sh --bootstrap-server localhost:6667 --topic ATLAS_HOOK --from-beginning

[2018-09-12 14:41:32,010] WARN [Consumer clientId=consumer-1, groupId=console-consumer-67769] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

avatar
Expert Contributor

@Maxim Neaga It is safe to ignore the error related to __AtlasUserProfile. Its a false positive.

avatar
Contributor

Changing offsets.topic.replication.factor in Kafka config to 1 (number if brokers) addressed the issue.

avatar
New Contributor

Hi, I am facing same issue even after changing the para - offsets.topic.replication.factor to 1 in kafka conf. Note that I have CDP 7.1.7 and total 8 brokers. 

I am able to import using import-hive.sh but not using hook. Any suggestion will be appreciates.

Thanks, Syed.

avatar
Cloudera Employee

I was seeing similar errors before. Changing the replication to 1 for the offsets.topic.replication.factor property seemed to resolve the issue. I had only one Kafka broker. Dont see the errors now and see the hive tables come in.

avatar
New Contributor

Hi! Do you resolve the question?