Member since
04-24-2016
52
Posts
11
Kudos Received
0
Solutions
06-27-2016
10:19 AM
@Joy I haven't down the HDP. Because I want to learn how to configure and use Atlas lonely, I compile the Atlas 0.7 by using maven, and download the Hive. So I am wondering , how to solve it other than download this jar file manually, or is the reason that the maven didn't include this jar ?
... View more
06-27-2016
04:37 AM
I want to use Hive Hook to import metadata automatically. So I set-up the hive-site.xml and export HIVE_AUX_JARS_PATH, and copy the atlas-application.properties to the hive conf according the Atlas official guide: http://atlas.apache.org/Bridge-Hive.html. But when I entered the Hive CLI, and typed "show tables;" or other commands. It showed that NoClassDefFoundError: com/google/gson/GsonBuilder I want to know how to solve it. In my <atlas-conf>/atlas-application.properties, most of settings are default. I never change them. This file is shown as following: ######### Graph Database Configs #########
# Graph Storage
#atlas.graph.storage.backend=berkeleyje
#atlas.graph.storage.directory=${sys:atlas.home}/data/berkley
#Hbase as stoarge backend
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=localhost
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#Solr
#atlas.graph.index.search.backend=solr
# Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=localhost:2181
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# Graph Search Index
#ElasticSearch
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=${sys:atlas.home}/data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000
######### Notification Configs #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.auto.offset.reset=smallest
atlas.kafka.hook.group.id=atlas
######### Hive Lineage Configs #########
# This models reflects the base super types for Data and Process
#atlas.lineage.hive.table.type.name=DataSet
#atlas.lineage.hive.process.type.name=Process
#atlas.lineage.hive.process.inputs.name=inputs
#atlas.lineage.hive.process.outputs.name=outputs
## Schema
atlas.lineage.hive.table.schema.query.hive_table=hive_table where name='%s'\, columns
atlas.lineage.hive.table.schema.query.Table=Table where name='%s'\, columns
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
# enabled: true or false
atlas.http.authentication.enabled=false
# type: simple or kerberos
atlas.http.authentication.type=simple
######### Server Properties #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=ATLAS_ENTITY_AUDIT_EVENTS
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=localhost:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
#### atlas.login.method {FILE,LDAP,AD} ####
atlas.login.method=FILE
### File path of users-credentials
atlas.login.credentials.file=${sys:atlas.home}/conf/users-credentials.properties
At last, I noticed that, these are some settings shown in the official guide: atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false
atlas.hook.hive.numRetries - number of retries for notification failure. default 3
atlas.hook.hive.minThreads - core number of threads. default 5
atlas.hook.hive.maxThreads - maximum number of threads. default 5
atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
atlas.hook.hive.queueSize - queue size for the threadpool. default 10000 Should I add these setting in to atlas-application.properties? And should I start the Hiveserver2 and the service of metastore of hive ?
... View more
Labels:
06-27-2016
03:33 AM
@Vadim I am trying to import metadata from Hive into Atlas. when I created a table in Hive CLI, run {atlas_home}/bin/import-hive.sh and I successfully imported the metadata, the Atlas Web UI showed that no lineage data was found. In my opinion, it should show the lineage between Hive and Atlas, but it showed nothing. How can I let it show the lineage when I run {atlas_home}/bin/import-hive.sh? Thank you very much.
... View more
06-25-2016
01:21 AM
@Ayub Pathan These issues still exist. And I post the descriptions in the last answer, please check/
... View more
06-25-2016
01:04 AM
@Ayub Pathan @Ayub Pathan These issues still exist. Firstly, I type "hive" and enter the Hive CLI. When I type "show tables;", then it report the errors like this: hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
Then export HIVE_AUX_JARS_PATH, and I run the services of hiveserver2 and metastore by using command: "hiveserver2" and "hive --service metastore". It will report the errors like this: Exception in thread "main" java.lang.NoClassDefFoundError: com/google/gson/GsonBuilder And the imported metadata also have no lineage data. What can I do next ? As shown following, there is my atlas-application.properties : (Most of them are default settings, I never change them. Should I delete the comments of atlas.lineage.*.*.*?) ######### Graph Database Configs #########
# Graph Storage
#atlas.graph.storage.backend=berkeleyje
#atlas.graph.storage.directory=${sys:atlas.home}/data/berkley
#Hbase as stoarge backend
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=localhost
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#Solr
#atlas.graph.index.search.backend=solr
# Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=localhost:2181
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# Graph Search Index
#ElasticSearch
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=${sys:atlas.home}/data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000
######### Notification Configs #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.auto.offset.reset=smallest
atlas.kafka.hook.group.id=atlas
######### Hive Lineage Configs #########
# This models reflects the base super types for Data and Process
#atlas.lineage.hive.table.type.name=DataSet
#atlas.lineage.hive.process.type.name=Process
#atlas.lineage.hive.process.inputs.name=inputs
#atlas.lineage.hive.process.outputs.name=outputs
## Schema
atlas.lineage.hive.table.schema.query.hive_table=hive_table where name='%s'\, columns
atlas.lineage.hive.table.schema.query.Table=Table where name='%s'\, columns
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
# enabled: true or false
atlas.http.authentication.enabled=false
# type: simple or kerberos
atlas.http.authentication.type=simple
######### Server Properties #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=ATLAS_ENTITY_AUDIT_EVENTS
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=localhost:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
#### atlas.login.method {FILE,LDAP,AD} ####
atlas.login.method=FILE
### File path of users-credentials
atlas.login.credentials.file=${sys:atlas.home}/conf/users-credentials.properties
... View more
06-24-2016
06:02 AM
Hi,@Ayub Khan, I opened the hive CLI, and executed what you said below, it shown as following: hive> create table sample (name String);
OK
Time taken: 0.92 seconds
hive> create table sample_ctas as select * from sample;
Query ID = hadoop_20160624135004_3a9c1e30-1c10-4433-bb58-7408471a0fd9
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-06-24 13:50:06,138 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local293851089_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-06-24_13-50-04_539_811658628250176459-1/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/sample_ctas
Table default.sample_ctas stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.831 seconds
It showed that I successfully create a table through CTAS. And then, I run {ATLAS_HOME}/bin/import-hive.sh. After it succeed, I visited http://localhost:21000, and found the table sample_cats, but these is still no lineage data.
... View more
06-24-2016
02:15 AM
1 Kudo
Question 1: According the official guide:http://atlas.apache.org/, some descriptions were shown as following: Data Classification Import or define taxonomy business-oriented annotations for data Define, annotate, and automate capture of relationships between data
sets and underlying elements including source, target, and derivation
processes I am wondering how to implement the automate capture of relationships between data sets? In order to define taxonomy annotations for data, we must use add Tag to data, right? ~ Question 2: How many ways to add Tag to data, except through the Atlas Web UI in the browser ?
... View more
Labels:
06-23-2016
02:22 PM
1 Kudo
Question 1: What is the practical applications of audit of Atlas ? According the official guide: http://atlas.apache.org/, it describes the effect of audit as following: Capture security access information for every application, process, and interaction with data Capture the operational information for execution, steps, and activities But these description are so abstract, I think. I am wondering what the specific use case of audit is. ~ Question 2: How to configure and use the audit? I never find the configuring information in the official guide. ~ Question 3: I remember that, the Atlas Web UI of old version Atlas has a Audit tag which could be clicked in the browser. But I never find the audit tag in the Web UI of Atlas 0.7 version. Why?
... View more
Labels:
06-15-2016
02:05 PM
Thank you very much. It import metadata successfully. But there have other questions. 1. When I visit localhost:21000 and I click the hive_table imported, it show that No lineage data found. In my opinion, this hive table was imported from hive to atlas, so it should show the lineage between hive and atlas. But it showed nothing. How to let it show lineage data? 2. I try to configure the Hive Hook according official guide and (1)set "hive.exec.post.hooks" and "atlas.cluster.name" in hive-site.xml. (2)Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh. (3)Copy <atlas-conf>/atlas-application.properties to the hive conf directory. After these configure, I type "hive" to enter the hive CLI and try to create a hive table, but it showed that hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1516)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
... View more
06-15-2016
07:24 AM
I never find the mysql client jar, but I try to add the mysql-connector-java-5.1.38-bin.jar to <atlas package>/bridge/hive/ (Is the mysql-connector-java-5.1.38-bin.jar same as mysql client jar ?), it showed different error: Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634)
at org.apache.atlas.AtlasClient.callAPIWithResource(AtlasClient.java:1026)
at org.apache.atlas.AtlasClient.callAPIWithRetries(AtlasClient.java:642)
at org.apache.atlas.AtlasClient.callAPI(AtlasClient.java:1050)
at org.apache.atlas.AtlasClient.getType(AtlasClient.java:537)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerHiveDataModel(HiveMetaStoreBridge.java:510)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:551)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:998)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:934)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:852)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1302)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 11 more
Failed to import Hive Data Model!!!
... View more