Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Atlas can't see Hive tables

Re: Atlas can't see Hive tables

New Contributor

THANK YOU I have this message in all the application.log

2018-07-20 14:07:52,071 INFO - [pool-2-thread-31 - a43d6555-0b77-4fbd-b093-c71c0d45a080:] ~ Audit: UNKNOWN/10.0.120.250-10.0.120.250 performed request GET http://dbmnprodza51001.serv.cdc.fr:21000/api/atlas/admin/status (10.0.120.250) at time 2018-07-20T12:07Z (AUDIT:117)

Highlighted

Re: Atlas can't see Hive tables

New Contributor

thank you but the only message i have in the application.log is

2018-07-20 14:07:52,071 INFO - [pool-2-thread-31 - a43d6555-0b77-4fbd-b093-c71c0d45a080:] ~ Audit: UNKNOWN/10.0.120.250-10.0.120.250 performed request GET http://localhost:21000/api/atlas/admin/status (10.0.120.250) at time 2018-07-20T12:07Z (AUDIT:117)

Re: Atlas can't see Hive tables

Expert Contributor

I'm running HDP 2.6.5 and I have experienced (and SOLVED) a problem that may be related with this and in any case may be of help for someone else with similar problems importing entities from Hive.

My cluster is Kerberized and when I run the import-hive.sh script as described in https://community.hortonworks.com/articles/61274/import-hive-metadata-into-atlas.html I get the following error:

$ /usr/hdp/2.6.5.0-292/atlas/hook-bin/import-hive.sh -Dsun.security.jgss.debug=true -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/atlas/conf/atlas_jaas.conf
Using Hive configuration directory [/etc/hive/conf]Log file for import is /usr/hdp/2.6.5.0-292/atlas/logs/import-hive.log
Usage 1: import-hive.sh [-d <database> OR --database <database>]  Imports specified database and its tables ...
...
Failed to import Hive Meta Data!!

To see what is happening I edited the import-hive.sh script and added an echo before the executed command:

echo "${JAVA_BIN}" ${JAVA_PROPERTIES} -cp "${CP}" org.apache.atlas.hive.bridge.HiveMetaStoreBridge $allargs

The executed JAVA command is shown to be something like this:

$ /usr/java/jdk1.8.0_152/bin/java -Datlas.log.dir=/usr/hdp/2.6.5.0-292/atlas/logs -Datlas.log.file=import-hive.log -Dlog4j.configuration=atlas-hive-import-log4j.xml \
-Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/atlas/conf/atlas_jaas.conf \
-cp ":<VERY-LONG-LIST-OF-JARS>:/usr/hdp/2.6.5.0-292/tez/lib/*:/usr/hdp/2.6.5.0-292/tez/conf" \
org.apache.atlas.hive.bridge.HiveMetaStoreBridge \
-Dsun.security.jgss.debug=true -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/atlas/conf/atlas_jaas.conf

From this I found out two things:

  1. The property definition options indicated in the article for the case of a Kerberized cluster ARE NOT NECESSARY because they are automatically detected included by the script from the HDP environment. You may see they are duplicated at the end of the command
  2. MOST IMPORTANT: the class path is malformed because it repeats many JARs and include every single JAR file in many HADOOP lib folders (including all the jars into hive-client/lib) instead of using a glob ("/lib/*") for this. This seems to hit some parameter length limit on Bash (or Java) and seems to be the reason for the command to fail!!

By editing the CLASSPATH and replacing near one-hundred listed JAR files by their single parent folders with a glob. I was able to drastically reduce the lenght of the executed command and was able run this without errors as shown bellow:

$ /usr/java/jdk1.8.0_152/bin/java -Datlas.log.dir=/usr/hdp/current/atlas-server/logs -Datlas.log.file=import-hive.log -Dlog4j.configuration=atlas-hive-import-log4j.xml 
\ -Djavax.security.auth.useSubjectCredsOnly=false -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/atlas/conf/atlas_jaas.conf
\ -cp ':/usr/hdp/current/atlas-server/hook/hive/atlas-hive-plugin-impl/*:/usr/hdp/current/hive-client/conf:/usr/hdp/current/hive-client/lib/*:mysql-connector-java.jar:postgresql-jdbc3.jar:postgresql-jdbc.jar:/usr/hdp/2.6.5.0-292/hadoop/conf:/usr/hdp/2.6.5.0-292/hadoop/lib/*:/usr/hdp/2.6.5.0-292/hadoop/*:/usr/hdp/2.6.5.0-292/hadoop-hdfs/:/usr/hdp/2.6.5.0-292/hadoop-hdfs/lib/*:/usr/hdp/2.6.5.0-292/hadoop-hdfs/*:/usr/hdp/2.6.5.0-292/hadoop-yarn/lib/*:/usr/hdp/2.6.5.0-292/hadoop-yarn/*:/usr/hdp/2.6.5.0-292/hadoop-mapreduce/lib/*:/usr/hdp/2.6.5.0-292/hadoop-mapreduce/*:/usr/hdp/2.6.5.0-292/tez/*:/usr/hdp/2.6.5.0-292/tez/lib/*:/usr/hdp/2.6.5.0-292/tez/conf'
\ org.apache.atlas.hive.bridge.HiveMetaStoreBridge

Search Subject for Kerberos V5 INIT cred (<<DEF>>, sun.security.jgss.krb5.Krb5InitCredential)
Search Subject for SPNEGO INIT cred (<<DEF>>, sun.security.jgss.spnego.SpNegoCredElement)
Search Subject for Kerberos V5 INIT cred (<<DEF>>, sun.security.jgss.krb5.Krb5InitCredential)
$

After this all the entities from Hive are imported into Atlas.

I hope this will be of help to someone else and somebody in Hortonworks would fix the script and related documentation.