Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

How to import metadata from Hive into Atlas? And the atlas offcial guide maybe have wrong.

Explorer

After built Atlas, I just only set the ATLAS_HOME_DIR in atlas-env.sh, and other settings, in atlas-env.sh and atlas-application.properties, are default.

I try to import metadata according http://atlas.apache.org/Bridge-Hive.html

After set $HIVE_CONF_DIR, I found that I can't set following configuration in atlas-application.properties.

<property>

	<name>atlas.cluster.name</name>

	<value>primary</value>

</property>

This is a XML style code, but the atlas-application.properties is not XML style, so I can't add this.

I am wondering if the official guide of atlas is not accurate?

Then I skip this setting, run import-hive.sh. It showed following:

Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.reflect.InvocationTargetException
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

in order to import metadata into Atlas, what could I do next ?

1 ACCEPTED SOLUTION

@Ethan Hsieh

Looks like there is typo in the documentation. The below config block should be added to hive-site.xml.

    <property>
      <name>atlas.cluster.name</name>
      <value>primary</value>
    </property>

Also, the issue here is, metastoreclient requires "com.mysql.jdbc.Driver" class to be added to the classpath. So can you please download the appropriate jar for the above class(for example: http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.18/mysql-connector-java-5.1.18.jar) and place it under ATLAS_HOME/bridge/hive path. This should fix the issue.

Let me know if you face any issues further after following the above steps. Happy to help!!

-Ayub Khan

View solution in original post

10 REPLIES 10

@Ethan Hsieh

Looks like hive metastore is using mysql in your case, add the mysql client jar to <atlas package>/bridge/hive/. That should work.

Ideally, import-hive.sh should use hive classpath so that all hive dependencies are included. Currently, we bundle hive dependencies as well and hence this issue if hive uses non-default driver.

Details: https://issues.apache.org/jira/browse/ATLAS-96

Hope this helps.

Thanks and Regards,

Sindhu

Explorer

I never find the mysql client jar, but I try to add the mysql-connector-java-5.1.38-bin.jar to <atlas package>/bridge/hive/ (Is the mysql-connector-java-5.1.38-bin.jar same as mysql client jar ?),

it showed different error:

Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused
    at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
    at com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
    at com.sun.jersey.api.client.Client.handle(Client.java:652)
    at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
    at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
    at com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634)
    at org.apache.atlas.AtlasClient.callAPIWithResource(AtlasClient.java:1026)
    at org.apache.atlas.AtlasClient.callAPIWithRetries(AtlasClient.java:642)
    at org.apache.atlas.AtlasClient.callAPI(AtlasClient.java:1050)
    at org.apache.atlas.AtlasClient.getType(AtlasClient.java:537)
    at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerHiveDataModel(HiveMetaStoreBridge.java:510)
    at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:551)
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
    at sun.net.www.http.HttpClient.New(HttpClient.java:308)
    at sun.net.www.http.HttpClient.New(HttpClient.java:326)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:998)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:934)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:852)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1302)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
    at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
    at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
    ... 11 more
Failed to import Hive Data Model!!!


Add atlas.rest.address=<Atlas-host>:<port> property to atlas-application.properties and try to run the import script.

@Ethan Hsieh, Can you confirm if you are able to successfully execute "show databases;" in hive shell.

Looks like you don't have a mysql-connector jar in $HIVE_HOME/lib. Please place the connector in hive lib and try to run the import script.

@Ethan Hsieh

Looks like there is typo in the documentation. The below config block should be added to hive-site.xml.

    <property>
      <name>atlas.cluster.name</name>
      <value>primary</value>
    </property>

Also, the issue here is, metastoreclient requires "com.mysql.jdbc.Driver" class to be added to the classpath. So can you please download the appropriate jar for the above class(for example: http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.18/mysql-connector-java-5.1.18.jar) and place it under ATLAS_HOME/bridge/hive path. This should fix the issue.

Let me know if you face any issues further after following the above steps. Happy to help!!

-Ayub Khan

Explorer

Thank you very much. It import metadata successfully.

But there have other questions.

1. When I visit localhost:21000 and I click the hive_table imported, it show that No lineage data found. In my opinion, this hive table was imported from hive to atlas, so it should show the lineage between hive and atlas. But it showed nothing. How to let it show lineage data?

2. I try to configure the Hive Hook according official guide and (1)set "hive.exec.post.hooks" and "atlas.cluster.name" in hive-site.xml. (2)Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh. (3)Copy <atlas-conf>/atlas-application.properties to the hive conf directory.

After these configure, I type "hive" to enter the hive CLI and try to create a hive table, but it showed that

hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:278)
        at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
        at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
        at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1516)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


1. When you import metadata from an already existing hive tables, it does not show lineage because hive does not provide these details to Atlas. If you want to see lineage in action, create table as select.

For example: Open hive cli and create below tables

create table sample (name String);
create table sample_ctas as select * from sample;

2. After setting the mentioned configuration, did you restart your hivemetastore and hiveserver2? This should solve issue.

Let me know if you are still facing the issue. Happy to help!!

-- Ayub Khan

Explorer

Hi,@Ayub Khan,

I opened the hive CLI, and executed what you said below, it shown as following:

hive> create table sample (name String);
OK
Time taken: 0.92 seconds
hive> create table sample_ctas as select * from sample;
Query ID = hadoop_20160624135004_3a9c1e30-1c10-4433-bb58-7408471a0fd9
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-06-24 13:50:06,138 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local293851089_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-06-24_13-50-04_539_811658628250176459-1/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/sample_ctas
Table default.sample_ctas stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.831 seconds

It showed that I successfully create a table through CTAS.

And then, I run {ATLAS_HOME}/bin/import-hive.sh.

After it succeed, I visited http://localhost:21000, and found the table sample_cats, but these is still no lineage data.

5223-screenshot-from-2016-06-24-140455.png

Explorer

@Ayub Pathan

These issues still exist. And I post the descriptions in the last answer, please check/

Explorer

@Ayub Pathan

@Ayub Pathan

These issues still exist.

Firstly, I type "hive" and enter the Hive CLI. When I type "show tables;", then it report the errors like this:

hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook

Then export HIVE_AUX_JARS_PATH, and I run the services of hiveserver2 and metastore by using command: "hiveserver2" and "hive --service metastore".

It will report the errors like this:

Exception in thread "main" java.lang.NoClassDefFoundError: com/google/gson/GsonBuilder

And the imported metadata also have no lineage data. What can I do next ?

As shown following, there is my atlas-application.properties :

(Most of them are default settings, I never change them. Should I delete the comments of atlas.lineage.*.*.*?)

#########  Graph Database Configs  #########
# Graph Storage
#atlas.graph.storage.backend=berkeleyje
#atlas.graph.storage.directory=${sys:atlas.home}/data/berkley

#Hbase as stoarge backend
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=localhost
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#Solr
#atlas.graph.index.search.backend=solr

# Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=localhost:2181

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# Graph Search Index
#ElasticSearch
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=${sys:atlas.home}/data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000


#########  Notification Configs  #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.auto.offset.reset=smallest
atlas.kafka.hook.group.id=atlas


#########  Hive Lineage Configs  #########
# This models reflects the base super types for Data and Process
#atlas.lineage.hive.table.type.name=DataSet
#atlas.lineage.hive.process.type.name=Process
#atlas.lineage.hive.process.inputs.name=inputs
#atlas.lineage.hive.process.outputs.name=outputs

## Schema
atlas.lineage.hive.table.schema.query.hive_table=hive_table where name='%s'\, columns
atlas.lineage.hive.table.schema.query.Table=Table where name='%s'\, columns

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

# enabled:  true or false
atlas.http.authentication.enabled=false
# type:  simple or kerberos
atlas.http.authentication.type=simple

#########  Server Properties  #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=ATLAS_ENTITY_AUDIT_EVENTS
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=localhost:2181

#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>


#### atlas.login.method {FILE,LDAP,AD} ####
atlas.login.method=FILE

### File path of users-credentials
atlas.login.credentials.file=${sys:atlas.home}/conf/users-credentials.properties