Support Questions
Find answers, ask questions, and share your expertise

Hive metadata does not not show up in Atlas with hook

Explorer

I have Hive and Atlas installed on HDP 2.6.5 with Hive hook for Atlas enabled and no changes to the configuration. I am able to successfully import Hive metadata with import-hive.sh, but the Hive hook does not seem to work. When I create a database in Hive, it does not show up in Atlas.

The only things I see in the logs are

atlas/application.log:

ERROR - [pool-2-thread-5 - de30e17a-1db7-4aad-8f34-a61a27b33cff:] ~ graph rollback due to exception AtlasBaseException:Instance __AtlasUserProfile with unique attribute {name=admin} does not exist (GraphTransactionInterceptor:73)

hive logs:

./hiveserver2.log.2018-09-10:2018-09-10 14:55:42,587 INFO [HiveServer2-Background-Pool: Thread-61]: hook.AtlasHook (AtlasHook.java:<clinit>(99)) - Created Atlas Hook

./hiveserver2.log:2018-09-11 09:04:20,100 INFO [HiveServer2-Background-Pool: Thread-3201]: log.PerfLogger (PerfLogger.java:PerfLogBegin(149)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver>

./hiveserver2.log:2018-09-11 09:04:20,100 INFO [HiveServer2-Background-Pool: Thread-3201]: log.PerfLogger (PerfLogger.java:PerfLogBegin(149)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver>

When I search for the database name in /kafka-logs/, it only shows up in ./ATLAS_HOOK-0/00000000000000000014.log, and the entry looks like

{"msgSourceIP":"172.18.181.235","msgCreatedBy":"hive","msgCreationTime":1536681860109,"message":{"entities":{"referredEntities":{},"entities":[{"typeName":"hive_db","attributes":{"owner":"hive","ownerType":"USER","qualifiedName":"oyster9@bigcentos","clusterName":"bigcentos","name":"oyster9","location":"hdfs://host.com:8020/apps/hive/warehouse/oyster9.db","parameters":{}},"guid":"-82688521591125","version":0}]},"type":"ENTITY_CREATE_V2","user":"hive"},"version":{"version":"1.0.0"},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1}

Which tells me that the message about the new DB gets passed to the Kafka stream, but is not read by Atlas.

I do not know where to look next, and my goal is to make it so that when a database is created in Hive, it shows up in Atlas automatically.

1 ACCEPTED SOLUTION

Accepted Solutions

Explorer

Changing offsets.topic.replication.factor in Kafka config to 1 (number if brokers) addressed the issue.

View solution in original post

5 REPLIES 5

Super Mentor

@Maxim Neaga

Can you please check if you have the Hive Clients installed on Atlas Node?

Also can you please let us know if you have setup Ranger ? If yes, then have added proper policies/permissions?

Explorer

@Jay Kumar SenSharma

Yes, I have Hive clients installed on all nodes.

I do have Ranger installed, but all plugins are disabled, so I do not think it affects it in any way.

I do see the messages about the new database posted into Kafka ATLAS_HOOK stream with

./kafka-console-consumer.sh --zookeeper localhost:2181 --topic ATLAS_HOOK --from-beginning

but for some reason, it does not work with --bootstrap-server:

./kafka-console-consumer.sh --bootstrap-server localhost:6667 --topic ATLAS_HOOK --from-beginning

[2018-09-12 14:41:32,010] WARN [Consumer clientId=consumer-1, groupId=console-consumer-67769] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

Expert Contributor

@Maxim Neaga It is safe to ignore the error related to __AtlasUserProfile. Its a false positive.

Explorer

Changing offsets.topic.replication.factor in Kafka config to 1 (number if brokers) addressed the issue.

View solution in original post

Cloudera Employee

I was seeing similar errors before. Changing the replication to 1 for the offsets.topic.replication.factor property seemed to resolve the issue. I had only one Kafka broker. Dont see the errors now and see the hive tables come in.