Support Questions

Find answers, ask questions, and share your expertise

Sqoop hook doesn't work for atlas?

avatar
Rising Star

I install atlas and sqoop respectively and haven't used HDP.

After execute this command:

sqoop import -connect
jdbc:mysql://master:3306/hive -username root -password admin -table
TBLS -hive-import -hive-table sqoophook1 

It shows that sqoop import data into hive successfully, and never report error.

Then I check the Atlas UI, search the sqoop_process type, but I can't check any information. Why?

`

Here is my configuration process:

Step 1: Set the <sqoop-conf>/sqoop-site.xml

<property> 
<name>sqoop.job.data.publish.class</name>
 <value>org.apache.atlas.sqoop.hook.SqoopHook</value>
 </property> 

Step 2: Copy the <atlas-conf>/atlas-application.properties to <sqoop-conf>

Step 3: Link <atlas-home>/hook/sqoop/*.jar in sqoop lib.

`

Are these configuration-steps wrong ?

Here is the output

sqoop import -connect jdbc:mysql://zte-1:3306/hive -username root -password admin -table TBLS -hive-import -hive-table sqoophook2
Warning: /var/local/hadoop/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /var/local/hadoop/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/08/23 01:04:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/08/23 01:04:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/23 01:04:04 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/08/23 01:04:04 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/08/23 01:04:05 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/08/23 01:04:05 INFO tool.CodeGenTool: Beginning code generation
16/08/23 01:04:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:04:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:04:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /var/local/hadoop/hadoop-2.6.0
Note: /tmp/sqoop-hdfs/compile/2606be5f25a97674311440065aac302d/TBLS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/23 01:04:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2606be5f25a97674311440065aac302d/TBLS.jar
16/08/23 01:04:09 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/08/23 01:04:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/08/23 01:04:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/08/23 01:04:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/08/23 01:04:09 INFO mapreduce.ImportJobBase: Beginning import of TBLS
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/var/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/local/hadoop/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/S     taticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/23 01:04:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using  builtin-java classes where applicable
16/08/23 01:04:10 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/08/23 01:04:11 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job. maps
16/08/23 01:04:11 INFO client.RMProxy: Connecting to ResourceManager at zte-1/192.168.136.128:8032
16/08/23 01:04:16 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/23 01:04:16 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`TBL_ID`), MAX(`TBL_ID`) FROM `TBLS`
16/08/23 01:04:17 INFO mapreduce.JobSubmitter: number of splits:4
16/08/23 01:04:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471882959657_0001
16/08/23 01:04:19 INFO impl.YarnClientImpl: Submitted application application_1471882959657_0001
16/08/23 01:04:19 INFO mapreduce.Job: The url to track the job: http://zte-1:8088/proxy/application_147188295                                                                                                                                9657_0001/
16/08/23 01:04:19 INFO mapreduce.Job: Running job: job_1471882959657_0001
16/08/23 01:04:37 INFO mapreduce.Job: Job job_1471882959657_0001 running in uber mode : false
16/08/23 01:04:37 INFO mapreduce.Job:  map 0% reduce 0%
16/08/23 01:05:05 INFO mapreduce.Job:  map 25% reduce 0%
16/08/23 01:05:07 INFO mapreduce.Job:  map 100% reduce 0%
16/08/23 01:05:08 INFO mapreduce.Job: Job job_1471882959657_0001 completed successfully
16/08/23 01:05:08 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=529788
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=426
                HDFS: Number of bytes written=171
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=102550
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=102550
                Total vcore-seconds taken by all map tasks=102550
                Total megabyte-seconds taken by all map tasks=105011200
        Map-Reduce Framework
                Map input records=3
                Map output records=3
                Input split bytes=426
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=1227
                CPU time spent (ms)=3640
                Physical memory (bytes) snapshot=390111232
                Virtual memory (bytes) snapshot=3376676864
                Total committed heap usage (bytes)=74018816
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=171
16/08/23 01:05:08 INFO mapreduce.ImportJobBase: Transferred 171 bytes in 57.2488 seconds (2.987 bytes/sec)
16/08/23 01:05:08 INFO mapreduce.ImportJobBase: Retrieved 3 records.
16/08/23 01:05:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `TBLS` AS t LIMIT 1
16/08/23 01:05:08 INFO hive.HiveImport: Loading uploaded data into Hive
16/08/23 01:05:19 INFO hive.HiveImport:
16/08/23 01:05:19 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/var/local/hadoop/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/var/local/hadoop/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/var/local/hadoop/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an e                                                                                                                                xplanation.
16/08/23 01:05:19 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/23 01:05:31 INFO hive.HiveImport: OK
16/08/23 01:05:31 INFO hive.HiveImport: Time taken: 3.481 seconds
16/08/23 01:05:31 INFO hive.HiveImport: Loading data to table default.sqoophook2
16/08/23 01:05:33 INFO hive.HiveImport: Table default.sqoophook2 stats: [numFiles=4, totalSize=171]
16/08/23 01:05:33 INFO hive.HiveImport: OK
16/08/23 01:05:33 INFO hive.HiveImport: Time taken: 1.643 seconds
16/08/23 01:05:35 INFO hive.HiveImport: Hive import complete.
16/08/23 01:05:35 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
1 ACCEPTED SOLUTION

avatar
Rising Star

@Ethan Hsieh

Could you also confirm if the sqoop-site.xml has the rest address for atlas server configured ?

Sample configuration is available here

View solution in original post

17 REPLIES 17

avatar
Rising Star

@Ethan Hsieh

Looks like this version of sqoop does not support integration with atlas. You may have to upgrade the version to sqoop-1.4.6.2.3.99.1-5.jar sandbox (HDP 2.4.0) or use sqoop-1.4.7 or later from apache.

Do let me know if the upgrade resolve the issue

avatar
Rising Star

@ckrishnakumar

Thank you very much. But I can find sqoop-1.4.7 in the official webside: http://sqoop.apache.org/

This webside shows that the latest stable release is 1.4.6. It doesn't provide the access for downloading the sqoop-1.4.7.

Could you give me a link to download the 1.4.7 version or send me the jar to my email: dreamcoding@outlook.com

avatar
Rising Star

@Ethan Hsieh I have sent you the jar file. Also you will be able to build this jar file by cloning the sqoop git repo - https://github.com/apache/sqoop.git

Details of how to compile is provided under - https://github.com/apache/sqoop/blob/trunk/COMPILING.txt

avatar
Rising Star

@Chethana Krishnakumar

Thank you very much, after imported the 1.4.7.jar package into sqoop1.4.6 does solve the problem. But I am worried that there will be some small problems in the future, so I came up with several solutions:

1. I found that the version of sqoop in HDP is 1.4.6, but as I mentioned before that the sqoop1.4.6 obtained from the official is not complete, I would like to ask you that can give me a full version of the 1.4.6.

2. can you provide me a full version of the sqoop-1.4.7.,not just only the 1.4.7.jar package

3. I even tried the latest release of the sqoop-1.99.7, but the official information only saw the use of it in importing data from the relational database into HDFS, I want to know the operation steps of using it to import datafrom the relational database into Hive.

avatar
Rising Star

@Ethan Hsieh

1.You will now be able to find the sqoop hook with http://hortonworks.com/tech-preview-hdp-2-5/

2.I could provide you with the full version but that may not be a clean fix.Please build sqoop from latest branch on apache here which would have all the changes.

3. Could you please post this as a different question as this is related to sqoop client

avatar
Expert Contributor

Sqoop hook for atlas is not part of the 2.4.0 release.

avatar
Contributor

@Chethana Krishnakumar

I have the same question.I used ant to compile the project that was downloaded from the github.But when i used the sqoop to import data from mysql to hive,the data could't be imported to hive and the atlas hook didn't work.So i want to know the sqoop project source that the 1.4.7 jar you got from was got from github?

avatar
Contributor

7823-捕获.png

when the job finished,it gived me these messages.But I can't understand it.