Member since
04-24-2016
52
Posts
11
Kudos Received
0
Solutions
12-02-2016
08:42 AM
@Greg Keys
Thank you very much. Could you give me a list which show the functions of every version of Atlas. I remember that I have downloaded this document of the list somewhere, but I lost it. Could you send me this list ?
... View more
11-30-2016
01:28 PM
1 Kudo
I want to write some introduction about
Atlas, but the information, which official webside atlas.apache.org provides, is not
enough. So I need some documents such as .pdf or
.ppt file about the following aspects:
The advantages of Atlas The release note or history of Atlas The introduction of architecture of Atlas Other overall information Thank you very much.
... View more
Labels:
09-14-2016
03:06 PM
@Chethana Krishnakumar Thank you very much, after imported the 1.4.7.jar package into sqoop1.4.6 does solve the problem. But I am worried that there will be some small problems in the future, so I came up with several solutions: 1. I found that the version of sqoop in HDP is 1.4.6, but as I mentioned before that the sqoop1.4.6 obtained from the official is not complete, I would like to ask you that can give me a full version of the 1.4.6. 2. can you provide me a full version of the sqoop-1.4.7.,not just only the 1.4.7.jar package 3. I even tried the latest release of the sqoop-1.99.7, but the official information only saw the use of it in importing data from the relational database into HDFS, I want to know the operation steps of using it to import datafrom the relational database into Hive.
... View more
08-26-2016
06:18 AM
4 Kudos
I want to add a new user account for atlas Web UI, so I append a line text into <atlas-conf>/users-credentials.properties zte=ADMIN::8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92
This means that, the user name is zte, and the password is 123456. But when I login in the Atlas Web UI by using this user name and password, it shown like this: The tips which on the top right of this screenshot showed that, this account are not authorized for READ *. So, how can I set the authority/right of my new user account ? Thank you very much.
... View more
Labels:
08-23-2016
06:55 AM
@ckrishnakumar Thank you very much. But I can find sqoop-1.4.7 in the official webside: http://sqoop.apache.org/ This webside shows that the latest stable release is 1.4.6. It doesn't provide the access for downloading the sqoop-1.4.7. Could you give me a link to download the 1.4.7 version or send me the jar to my email: dreamcoding@outlook.com
... View more
08-23-2016
06:05 AM
@ckrishnakumar The output of this command is nothing. This is the result shown in the terminal. I also pasted the screenshot next to it. [hdfs@zte-1 sqoop-1.4.6]$ ls
bin CHANGELOG.txt conf ivy lib NOTICE.txt README.txt sqoop-patch-review.py src
build.xml COMPILING.txt docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.6.jar sqoop-test-1.4.6.jar testdata
[hdfs@zte-1 sqoop-1.4.6]$ jar tvf sqoop-1.4.6.jar | grep .class | grep SqoopJobDataPublisher
[hdfs@zte-1 sqoop-1.4.6]$
... View more
08-23-2016
05:49 AM
How to link these JARs? Copy these JARs into <sqoop-home>/lib ? Or use command: ln -s <atlas-home>/hook/sqoop/* <sqoop-home>/lib/ ?
... View more
08-22-2016
09:36 AM
@ckrishnakumar After I added atlas.rest.address property to the sqoop-site.xml, the problem is same. I search sqoop_process in the Atlas Web UI and no result was found. But the Hive hook is work, it can capture the imported hive_table which is shown in the atlas Web UI. I paste the output on the next answer. And the output doesn't report any error. `` I remembered that, when I configured the Hive hook, I added some path of JARs for the HIVE_AUX_JARS_PATH. But the configuration process of Sqoop hook is lack of this step. Is it necessary to add some path of JARs for sqoop? It seems that the SqoopHook Class doesn't work.
... View more
08-22-2016
09:35 AM
@Ayub Pathan After I added atlas.rest.address property to the sqoop-site.xml, the problem is same. I search sqoop_process in the Atlas Web UI and no result was found. But the Hive hook is work, it can capture the imported hive_table which is shown in the atlas Web UI. I paste the output on the next answer. And the output doesn't report any error. `` I remembered that, when I configured the Hive hook, I added some path of JARs for the HIVE_AUX_JARS_PATH. But the configuration process of Sqoop hook is lack of this step. Is it necessary to add some path of JARs for sqoop? It seems that the SqoopHook Class doesn't work.
... View more
08-22-2016
09:16 AM
@Ayub Pathan @ckrishnakumar The sqoop hook still doesn't work. Here is the console output for the executed sqoop command: sqoop import -connect jdbc:mysql://zte-1:3306/hive -username root -password admin -table TBLS -hive-import -hive-table sqoophook2
Warning: /var/local/hadoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /var/local/hadoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/08/23 01:04:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/08/23 01:04:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/23 01:04:04 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/08/23 01:04:04 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/08/23 01:04:05 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/08/23 01:04:05 INFO tool.CodeGenTool: Beginning code generation
16/08/23 01:04:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:04:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:04:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /var/local/hadoop/hadoop-2.6.0
Note: /tmp/sqoop-hdfs/compile/2606be5f25a97674311440065aac302d/TBLS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/23 01:04:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2606be5f25a97674311440065aac302d/TBLS.jar
16/08/23 01:04:09 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/08/23 01:04:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/08/23 01:04:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/08/23 01:04:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/08/23 01:04:09 INFO mapreduce.ImportJobBase: Beginning import of TBLS
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/var/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/local/hadoop/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/23 01:04:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/23 01:04:10 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/08/23 01:04:11 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job. maps
16/08/23 01:04:11 INFO client.RMProxy: Connecting to ResourceManager at zte-1/192.168.136.128:8032
16/08/23 01:04:16 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/23 01:04:16 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`TBL_ID`), MAX(`TBL_ID`) FROM `TBLS`
16/08/23 01:04:17 INFO mapreduce.JobSubmitter: number of splits:4
16/08/23 01:04:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471882959657_0001
16/08/23 01:04:19 INFO impl.YarnClientImpl: Submitted application application_1471882959657_0001
16/08/23 01:04:19 INFO mapreduce.Job: The url to track the job: http://zte-1:8088/proxy/application_147188295 9657_0001/
16/08/23 01:04:19 INFO mapreduce.Job: Running job: job_1471882959657_0001
16/08/23 01:04:37 INFO mapreduce.Job: Job job_1471882959657_0001 running in uber mode : false
16/08/23 01:04:37 INFO mapreduce.Job: map 0% reduce 0%
16/08/23 01:05:05 INFO mapreduce.Job: map 25% reduce 0%
16/08/23 01:05:07 INFO mapreduce.Job: map 100% reduce 0%
16/08/23 01:05:08 INFO mapreduce.Job: Job job_1471882959657_0001 completed successfully
16/08/23 01:05:08 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=529788
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=426
HDFS: Number of bytes written=171
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=102550
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=102550
Total vcore-seconds taken by all map tasks=102550
Total megabyte-seconds taken by all map tasks=105011200
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=426
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1227
CPU time spent (ms)=3640
Physical memory (bytes) snapshot=390111232
Virtual memory (bytes) snapshot=3376676864
Total committed heap usage (bytes)=74018816
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=171
16/08/23 01:05:08 INFO mapreduce.ImportJobBase: Transferred 171 bytes in 57.2488 seconds (2.987 bytes/sec)
16/08/23 01:05:08 INFO mapreduce.ImportJobBase: Retrieved 3 records.
16/08/23 01:05:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:05:08 INFO hive.HiveImport: Loading uploaded data into Hive
16/08/23 01:05:19 INFO hive.HiveImport:
16/08/23 01:05:19 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/var/local/hadoop/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/var/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/var/local/hadoop/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an e xplanation.
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/23 01:05:31 INFO hive.HiveImport: OK
16/08/23 01:05:31 INFO hive.HiveImport: Time taken: 3.481 seconds
16/08/23 01:05:31 INFO hive.HiveImport: Loading data to table default.sqoophook2
16/08/23 01:05:33 INFO hive.HiveImport: Table default.sqoophook2 stats: [numFiles=4, totalSize=171]
16/08/23 01:05:33 INFO hive.HiveImport: OK
16/08/23 01:05:33 INFO hive.HiveImport: Time taken: 1.643 seconds
16/08/23 01:05:35 INFO hive.HiveImport: Hive import complete.
16/08/23 01:05:35 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
... View more
08-17-2016
04:01 AM
1 Kudo
I install atlas and sqoop respectively and haven't used HDP. After execute this command: sqoop import -connect
jdbc:mysql://master:3306/hive -username root -password admin -table
TBLS -hive-import -hive-table sqoophook1
It shows that sqoop import data into hive successfully, and never report error. Then I check the Atlas UI, search the sqoop_process type, but I can't check any information. Why? ` Here is my configuration process: Step 1: Set the <sqoop-conf>/sqoop-site.xml <property>
<name>sqoop.job.data.publish.class</name>
<value>org.apache.atlas.sqoop.hook.SqoopHook</value>
</property> Step 2: Copy the <atlas-conf>/atlas-application.properties to <sqoop-conf> Step 3: Link <atlas-home>/hook/sqoop/*.jar in sqoop lib. ` Are these configuration-steps wrong ? Here is the output sqoop import -connect jdbc:mysql://zte-1:3306/hive -username root -password admin -table TBLS -hive-import -hive-table sqoophook2
Warning: /var/local/hadoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /var/local/hadoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/08/23 01:04:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/08/23 01:04:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/23 01:04:04 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/08/23 01:04:04 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/08/23 01:04:05 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/08/23 01:04:05 INFO tool.CodeGenTool: Beginning code generation
16/08/23 01:04:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:04:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:04:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /var/local/hadoop/hadoop-2.6.0
Note: /tmp/sqoop-hdfs/compile/2606be5f25a97674311440065aac302d/TBLS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/23 01:04:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2606be5f25a97674311440065aac302d/TBLS.jar
16/08/23 01:04:09 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/08/23 01:04:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/08/23 01:04:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/08/23 01:04:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/08/23 01:04:09 INFO mapreduce.ImportJobBase: Beginning import of TBLS
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/var/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/local/hadoop/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/23 01:04:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/23 01:04:10 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/08/23 01:04:11 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job. maps
16/08/23 01:04:11 INFO client.RMProxy: Connecting to ResourceManager at zte-1/192.168.136.128:8032
16/08/23 01:04:16 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/23 01:04:16 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`TBL_ID`), MAX(`TBL_ID`) FROM `TBLS`
16/08/23 01:04:17 INFO mapreduce.JobSubmitter: number of splits:4
16/08/23 01:04:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471882959657_0001
16/08/23 01:04:19 INFO impl.YarnClientImpl: Submitted application application_1471882959657_0001
16/08/23 01:04:19 INFO mapreduce.Job: The url to track the job: http://zte-1:8088/proxy/application_147188295 9657_0001/
16/08/23 01:04:19 INFO mapreduce.Job: Running job: job_1471882959657_0001
16/08/23 01:04:37 INFO mapreduce.Job: Job job_1471882959657_0001 running in uber mode : false
16/08/23 01:04:37 INFO mapreduce.Job: map 0% reduce 0%
16/08/23 01:05:05 INFO mapreduce.Job: map 25% reduce 0%
16/08/23 01:05:07 INFO mapreduce.Job: map 100% reduce 0%
16/08/23 01:05:08 INFO mapreduce.Job: Job job_1471882959657_0001 completed successfully
16/08/23 01:05:08 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=529788
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=426
HDFS: Number of bytes written=171
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=102550
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=102550
Total vcore-seconds taken by all map tasks=102550
Total megabyte-seconds taken by all map tasks=105011200
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=426
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1227
CPU time spent (ms)=3640
Physical memory (bytes) snapshot=390111232
Virtual memory (bytes) snapshot=3376676864
Total committed heap usage (bytes)=74018816
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=171
16/08/23 01:05:08 INFO mapreduce.ImportJobBase: Transferred 171 bytes in 57.2488 seconds (2.987 bytes/sec)
16/08/23 01:05:08 INFO mapreduce.ImportJobBase: Retrieved 3 records.
16/08/23 01:05:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:05:08 INFO hive.HiveImport: Loading uploaded data into Hive
16/08/23 01:05:19 INFO hive.HiveImport:
16/08/23 01:05:19 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/var/local/hadoop/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/var/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/var/local/hadoop/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an e xplanation.
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/23 01:05:31 INFO hive.HiveImport: OK
16/08/23 01:05:31 INFO hive.HiveImport: Time taken: 3.481 seconds
16/08/23 01:05:31 INFO hive.HiveImport: Loading data to table default.sqoophook2
16/08/23 01:05:33 INFO hive.HiveImport: Table default.sqoophook2 stats: [numFiles=4, totalSize=171]
16/08/23 01:05:33 INFO hive.HiveImport: OK
16/08/23 01:05:33 INFO hive.HiveImport: Time taken: 1.643 seconds
16/08/23 01:05:35 INFO hive.HiveImport: Hive import complete.
16/08/23 01:05:35 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
... View more
Labels:
07-08-2016
05:05 AM
If I want to deploy Atlas on a cluster, should I install Atlas in every node of the cluster? If the answer is yes, which node is the best to install Atlas? the Master node which run Namenode of hadoop or other? ` If I want to deploy the High Availability of Atlas, should Atlas be install every machine of the cluster?
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Hadoop
06-30-2016
03:19 AM
@Vadim Thank you very much. It really help me to understand the meaning of lineage. And there are two simple question: Q1: If I use Sqoop Bridge, does it mean that I can transmit the metadata from DBMS(eg: MySQL) to Atlas. And it is not necessary to use Sqoop deliver the data from MySQL into Hive, then deliver the data from MySQL into Atlas. In one word, by using Sqoop Bridge, if I can deliver metadata from MySQL to Atlas without Hive ? Q2: When I create another table from the existing hive table in Hive CLI, it will search the JAR files in the HDFS path, but these JAR files are located in the local file system, not in the HDFS. How could I change the path of needed JAR files ? The details of this question is here: https://community.hortonworks.com/questions/41898/using-hive-hook-file-does-not-exist-atlas-client-0.html I hope you can help me. Thank you very much. Thank you.
... View more
06-29-2016
06:38 AM
Hi, @Ryan Cicak I can find the file atlas-client-0.7-incubating-SNAPSHOT.jar in this path: /usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar But why it search this file with HDFS prefix: hdfs://localhost:9000 hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar)
How to change the path of JAR files from a HDFS path to the path of local file system?
... View more
06-29-2016
05:13 AM
@Sindhu The table were created in the Hive CLI, and I run import-hive.sh in the dir of $HIVE_HOME/bin. But there is another more important question: https://community.hortonworks.com/questions/41898/using-hive-hook-file-does-not-exist-atlas-client-0.html In the link above, the result showed a error: File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar Could you help me to find the file: atlas-client-0.7-incubating-SNAPSHOT.jar ?
... View more
06-29-2016
05:04 AM
Hi, @Ryan Cicak I try it in the beeline, and the error is same. it seems that the atlas-client-*.jar is missing. Where could I find the client.jar ? Here is the report: hadoop@eite:~$ beeline
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Enter username for jdbc:hive2://localhost:10000/default: hadoop
Enter password for jdbc:hive2://localhost:10000/default: ******
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/default> insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago');
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO : Cleaning up the staging area file:/usr/local/hadoop/tmp/mapred/staging/hadoop1956486297/.staging/job_local1956486297_0002
ERROR : Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar)'
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
... View more
06-28-2016
12:57 AM
@Ryan Cicak Hi,this is the demo that help me well. But when I execeted the command: insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago'); It report error like this: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.ja The detail of this issue is posted on my another thread: https://community.hortonworks.com/questions/41898/using-hive-hook-file-does-not-exist-atlas-client-0.html Please check. I hope you can help me. Thank you very much.
... View more
06-27-2016
12:18 PM
After configure the Hive Hook according http://atlas.apache.org/Bridge-Hive.html, I could create table in the Hive CLI. Then I could find this table in Atlas Web UI and it showed that no lineage data was found. So I search this question in this community, I find this tutorial of @Ryan Cicak: https://community.hortonworks.com/articles/36121/using-apache-atlas-to-view-data-lineage.html According this tutorial, I run the following command in Hive CLI: create table brancha(full_name string, ssn string, location string); It would be executed successfully. But when I tried to run this: insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago'); It reported Error which is like this: hive> insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago');
Query ID = hadoop_20160627200051_392a732d-cf49-4d13-a10d-cea68fb32217
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.jar)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
It means that the JAR file:atlas-client-0.7-incubating-SNAPSHOT.jar doesn't exist. I also searched other solutions, and I run command of CTAS style: CREATE TABLE table1 AS SELECT * FROM table2; It reported the same error mentioned above. ~ In order solve this problem, I post my setting of <atlas-package>/conf/atlas-application.properties if it can be helpful: ######### Graph Database Configs #########
# Graph Storage
#atlas.graph.storage.backend=berkeleyje
#atlas.graph.storage.directory=${sys:atlas.home}/data/berkley
#Hbase as stoarge backend
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=localhost
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#Solr
#atlas.graph.index.search.backend=solr
# Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=localhost:2181
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# Graph Search Index
#ElasticSearch
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=${sys:atlas.home}/data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000
######### Notification Configs #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.auto.offset.reset=smallest
atlas.kafka.hook.group.id=atlas
######### Hive Lineage Configs #########
# This models reflects the base super types for Data and Process
#atlas.lineage.hive.table.type.name=DataSet
#atlas.lineage.hive.process.type.name=Process
#atlas.lineage.hive.process.inputs.name=inputs
#atlas.lineage.hive.process.outputs.name=outputs
## Schema
atlas.lineage.hive.table.schema.query.hive_table=hive_table where name='%s'\, columns
atlas.lineage.hive.table.schema.query.Table=Table where name='%s'\, columns
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
# enabled: true or false
atlas.http.authentication.enabled=false
# type: simple or kerberos
atlas.http.authentication.type=simple
######### Server Properties #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=ATLAS_ENTITY_AUDIT_EVENTS
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=localhost:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
#### atlas.login.method {FILE,LDAP,AD} ####
atlas.login.method=FILE
### File path of users-credentials
atlas.login.credentials.file=${sys:atlas.home}/conf/users-credentials.properties
Most of settings are default, I never change them. And I never start the Hiveserver2 and the service of metastore of hive. All the commands are executed in the Hive CLI(Command-Line Interface).
... View more
Labels:
06-27-2016
10:57 AM
@Sindhu Thank you very much. After I import the metadata from Hive into Atlas, the Atlas Web UI showed that No lineage data was found. In general, it should showed the lineage between Hive and Atlas. Should I run the hiveserver2 or change the file:atlas-application.properties ?
... View more
06-27-2016
10:52 AM
@Divakar Annapureddy Thank you very much. After I download the jar of gson, it can works. But I am wondering why this jar is missing ? Did this problem happen in the process of compiling the Atlas by using maven ? And, after importing the metadata, the Atlas UI Web showed that there is no lineage data. Should I run the hiveserver2?
... View more
06-27-2016
10:19 AM
@Joy I haven't down the HDP. Because I want to learn how to configure and use Atlas lonely, I compile the Atlas 0.7 by using maven, and download the Hive. So I am wondering , how to solve it other than download this jar file manually, or is the reason that the maven didn't include this jar ?
... View more
06-27-2016
04:37 AM
I want to use Hive Hook to import metadata automatically. So I set-up the hive-site.xml and export HIVE_AUX_JARS_PATH, and copy the atlas-application.properties to the hive conf according the Atlas official guide: http://atlas.apache.org/Bridge-Hive.html. But when I entered the Hive CLI, and typed "show tables;" or other commands. It showed that NoClassDefFoundError: com/google/gson/GsonBuilder I want to know how to solve it. In my <atlas-conf>/atlas-application.properties, most of settings are default. I never change them. This file is shown as following: ######### Graph Database Configs #########
# Graph Storage
#atlas.graph.storage.backend=berkeleyje
#atlas.graph.storage.directory=${sys:atlas.home}/data/berkley
#Hbase as stoarge backend
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=localhost
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#Solr
#atlas.graph.index.search.backend=solr
# Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=localhost:2181
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# Graph Search Index
#ElasticSearch
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=${sys:atlas.home}/data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000
######### Notification Configs #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.auto.offset.reset=smallest
atlas.kafka.hook.group.id=atlas
######### Hive Lineage Configs #########
# This models reflects the base super types for Data and Process
#atlas.lineage.hive.table.type.name=DataSet
#atlas.lineage.hive.process.type.name=Process
#atlas.lineage.hive.process.inputs.name=inputs
#atlas.lineage.hive.process.outputs.name=outputs
## Schema
atlas.lineage.hive.table.schema.query.hive_table=hive_table where name='%s'\, columns
atlas.lineage.hive.table.schema.query.Table=Table where name='%s'\, columns
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
# enabled: true or false
atlas.http.authentication.enabled=false
# type: simple or kerberos
atlas.http.authentication.type=simple
######### Server Properties #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=ATLAS_ENTITY_AUDIT_EVENTS
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=localhost:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
#### atlas.login.method {FILE,LDAP,AD} ####
atlas.login.method=FILE
### File path of users-credentials
atlas.login.credentials.file=${sys:atlas.home}/conf/users-credentials.properties
At last, I noticed that, these are some settings shown in the official guide: atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false
atlas.hook.hive.numRetries - number of retries for notification failure. default 3
atlas.hook.hive.minThreads - core number of threads. default 5
atlas.hook.hive.maxThreads - maximum number of threads. default 5
atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10
atlas.hook.hive.queueSize - queue size for the threadpool. default 10000 Should I add these setting in to atlas-application.properties? And should I start the Hiveserver2 and the service of metastore of hive ?
... View more
Labels:
06-27-2016
03:33 AM
@Vadim I am trying to import metadata from Hive into Atlas. when I created a table in Hive CLI, run {atlas_home}/bin/import-hive.sh and I successfully imported the metadata, the Atlas Web UI showed that no lineage data was found. In my opinion, it should show the lineage between Hive and Atlas, but it showed nothing. How can I let it show the lineage when I run {atlas_home}/bin/import-hive.sh? Thank you very much.
... View more
06-25-2016
01:21 AM
@Ayub Pathan These issues still exist. And I post the descriptions in the last answer, please check/
... View more
06-25-2016
01:04 AM
@Ayub Pathan @Ayub Pathan These issues still exist. Firstly, I type "hive" and enter the Hive CLI. When I type "show tables;", then it report the errors like this: hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
Then export HIVE_AUX_JARS_PATH, and I run the services of hiveserver2 and metastore by using command: "hiveserver2" and "hive --service metastore". It will report the errors like this: Exception in thread "main" java.lang.NoClassDefFoundError: com/google/gson/GsonBuilder And the imported metadata also have no lineage data. What can I do next ? As shown following, there is my atlas-application.properties : (Most of them are default settings, I never change them. Should I delete the comments of atlas.lineage.*.*.*?) ######### Graph Database Configs #########
# Graph Storage
#atlas.graph.storage.backend=berkeleyje
#atlas.graph.storage.directory=${sys:atlas.home}/data/berkley
#Hbase as stoarge backend
atlas.graph.storage.backend=hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here - For more information refer http://s3.thinkaurelius.com/docs/titan/current/hbase.html#_remote_server_mode_2
atlas.graph.storage.hostname=localhost
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#Solr
#atlas.graph.index.search.backend=solr
# Solr cloud mode properties
#atlas.graph.index.search.solr.mode=cloud
#atlas.graph.index.search.solr.zookeeper-url=localhost:2181
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# Graph Search Index
#ElasticSearch
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.directory=${sys:atlas.home}/data/es
atlas.graph.index.search.elasticsearch.client-only=false
atlas.graph.index.search.elasticsearch.local-mode=true
atlas.graph.index.search.elasticsearch.create.sleep=2000
######### Notification Configs #########
atlas.notification.embedded=true
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=localhost:9026
atlas.kafka.bootstrap.servers=localhost:9027
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.auto.offset.reset=smallest
atlas.kafka.hook.group.id=atlas
######### Hive Lineage Configs #########
# This models reflects the base super types for Data and Process
#atlas.lineage.hive.table.type.name=DataSet
#atlas.lineage.hive.process.type.name=Process
#atlas.lineage.hive.process.inputs.name=inputs
#atlas.lineage.hive.process.outputs.name=outputs
## Schema
atlas.lineage.hive.table.schema.query.hive_table=hive_table where name='%s'\, columns
atlas.lineage.hive.table.schema.query.Table=Table where name='%s'\, columns
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
# enabled: true or false
atlas.http.authentication.enabled=false
# type: simple or kerberos
atlas.http.authentication.type=simple
######### Server Properties #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=ATLAS_ENTITY_AUDIT_EVENTS
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=localhost:2181
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
#### atlas.login.method {FILE,LDAP,AD} ####
atlas.login.method=FILE
### File path of users-credentials
atlas.login.credentials.file=${sys:atlas.home}/conf/users-credentials.properties
... View more
06-24-2016
06:02 AM
Hi,@Ayub Khan, I opened the hive CLI, and executed what you said below, it shown as following: hive> create table sample (name String);
OK
Time taken: 0.92 seconds
hive> create table sample_ctas as select * from sample;
Query ID = hadoop_20160624135004_3a9c1e30-1c10-4433-bb58-7408471a0fd9
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2016-06-24 13:50:06,138 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local293851089_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/user/hive/warehouse/.hive-staging_hive_2016-06-24_13-50-04_539_811658628250176459-1/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/sample_ctas
Table default.sample_ctas stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.831 seconds
It showed that I successfully create a table through CTAS. And then, I run {ATLAS_HOME}/bin/import-hive.sh. After it succeed, I visited http://localhost:21000, and found the table sample_cats, but these is still no lineage data.
... View more
06-24-2016
02:15 AM
1 Kudo
Question 1: According the official guide:http://atlas.apache.org/, some descriptions were shown as following: Data Classification Import or define taxonomy business-oriented annotations for data Define, annotate, and automate capture of relationships between data
sets and underlying elements including source, target, and derivation
processes I am wondering how to implement the automate capture of relationships between data sets? In order to define taxonomy annotations for data, we must use add Tag to data, right? ~ Question 2: How many ways to add Tag to data, except through the Atlas Web UI in the browser ?
... View more
Labels:
06-23-2016
02:22 PM
1 Kudo
Question 1: What is the practical applications of audit of Atlas ? According the official guide: http://atlas.apache.org/, it describes the effect of audit as following: Capture security access information for every application, process, and interaction with data Capture the operational information for execution, steps, and activities But these description are so abstract, I think. I am wondering what the specific use case of audit is. ~ Question 2: How to configure and use the audit? I never find the configuring information in the official guide. ~ Question 3: I remember that, the Atlas Web UI of old version Atlas has a Audit tag which could be clicked in the browser. But I never find the audit tag in the Web UI of Atlas 0.7 version. Why?
... View more
Labels:
06-15-2016
02:05 PM
Thank you very much. It import metadata successfully. But there have other questions. 1. When I visit localhost:21000 and I click the hive_table imported, it show that No lineage data found. In my opinion, this hive table was imported from hive to atlas, so it should show the lineage between hive and atlas. But it showed nothing. How to let it show lineage data? 2. I try to configure the Hive Hook according official guide and (1)set "hive.exec.post.hooks" and "atlas.cluster.name" in hive-site.xml. (2)Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh. (3)Copy <atlas-conf>/atlas-application.properties to the hive conf directory. After these configure, I type "hive" to enter the hive CLI and try to create a hive table, but it showed that hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1516)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
... View more
06-15-2016
07:24 AM
I never find the mysql client jar, but I try to add the mysql-connector-java-5.1.38-bin.jar to <atlas package>/bridge/hive/ (Is the mysql-connector-java-5.1.38-bin.jar same as mysql client jar ?), it showed different error: Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634)
at org.apache.atlas.AtlasClient.callAPIWithResource(AtlasClient.java:1026)
at org.apache.atlas.AtlasClient.callAPIWithRetries(AtlasClient.java:642)
at org.apache.atlas.AtlasClient.callAPI(AtlasClient.java:1050)
at org.apache.atlas.AtlasClient.getType(AtlasClient.java:537)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.registerHiveDataModel(HiveMetaStoreBridge.java:510)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:551)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:998)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:934)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:852)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1302)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 11 more
Failed to import Hive Data Model!!!
... View more