Atlas Sqoop lineage with Hive is not working


Hi Team,

We are using HDP-2.6.5. Using given doc we are configuring Sqoop and Hive lineage:

While running sqoop import, we are getting below ClassNotFoundException :

sqoop import --connect jdbc:mysql:// --table test_table_sqoop1 --hive-import --hive-table test_hive_table4 --username root -P -m 1 --fetch-size 1
Warning: /usr/hdp/ does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/12/17 05:50:21 INFO sqoop.Sqoop: Running Sqoop version:
Enter password:
18/12/17 05:50:28 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/12/17 05:50:28 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/12/17 05:50:28 INFO manager.MySQLManager: Argument '--fetch-size 1' will probably get ignored by MySQL JDBC driver.
18/12/17 05:50:28 INFO tool.CodeGenTool: Beginning code generation
18/12/17 05:50:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_table_sqoop1` AS t LIMIT 1
18/12/17 05:50:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_table_sqoop1` AS t LIMIT 1
18/12/17 05:50:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/
Note: /tmp/sqoop-hdfs/compile/90ee7535be590b2e48c64709e9c0127d/ uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/12/17 05:50:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/90ee7535be590b2e48c64709e9c0127d/test_table_sqoop1.jar
18/12/17 05:50:29 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/12/17 05:50:29 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/12/17 05:50:29 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/12/17 05:50:29 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/12/17 05:50:29 INFO mapreduce.ImportJobBase: Beginning import of test_table_sqoop1
18/12/17 05:50:30 INFO client.AHSProxy: Connecting to Application History server at
18/12/17 05:50:30 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
18/12/17 05:50:30 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
18/12/17 05:50:31 INFO db.DBInputFormat: Using read commited transaction isolation
18/12/17 05:50:31 INFO mapreduce.JobSubmitter: number of splits:1
18/12/17 05:50:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1544603908449_0008
18/12/17 05:50:32 INFO impl.YarnClientImpl: Submitted application application_1544603908449_0008
18/12/17 05:50:32 INFO mapreduce.Job: The url to track the job:
18/12/17 05:50:32 INFO mapreduce.Job: Running job: job_1544603908449_0008
18/12/17 05:50:40 INFO mapreduce.Job: Job job_1544603908449_0008 running in uber mode : false
18/12/17 05:50:40 INFO mapreduce.Job:  map 0% reduce 0%
18/12/17 05:50:48 INFO mapreduce.Job:  map 100% reduce 0%
18/12/17 05:50:48 INFO mapreduce.Job: Job job_1544603908449_0008 completed successfully
18/12/17 05:50:48 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=172085
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=172
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6151
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=6151
                Total vcore-milliseconds taken by all map tasks=6151
                Total megabyte-milliseconds taken by all map tasks=25194496
        Map-Reduce Framework
                Map input records=6
                Map output records=6
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=68
                CPU time spent (ms)=1220
                Physical memory (bytes) snapshot=392228864
                Virtual memory (bytes) snapshot=6079295488
                Total committed heap usage (bytes)=610795520
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=172
18/12/17 05:50:48 INFO mapreduce.ImportJobBase: Transferred 172 bytes in 18.2966 seconds (9.4006 bytes/sec)
18/12/17 05:50:48 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/12/17 05:50:48 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
18/12/17 05:50:48 WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
        at java.lang.ClassLoader.loadClass(
        at sun.misc.Launcher$AppClassLoader.loadClass(
        at java.lang.ClassLoader.loadClass(
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(
        at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(
        at org.apache.sqoop.manager.SqlManager.importTable(
        at org.apache.sqoop.manager.MySQLManager.importTable(
        at org.apache.sqoop.tool.ImportTool.importTable(
        at org.apache.sqoop.Sqoop.runSqoop(
        at org.apache.sqoop.Sqoop.runTool(
        at org.apache.sqoop.Sqoop.runTool(
        at org.apache.sqoop.Sqoop.main(
18/12/17 05:50:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_table_sqoop1` AS t LIMIT 1
18/12/17 05:50:48 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/!/
Time taken: 4.355 seconds
Loading data to table default.test_hive_table4
Table default.test_hive_table4 stats: [numFiles=1, numRows=0, totalSize=172, rawDataSize=0]
Time taken: 3.085 seconds

How to resolve it?

Please suggest. Thanks in advance.





Thanks @Geoffrey Shelton Okot for researching on this. I have resolved this issue by following instructions given in this link:

@Bhushan Kandalkar

I have just validated the process and it works, especially the sqoop import please see attached pdf. I suspect you don't have kafka installed if yes it isn't started

  • HDP
  • Ranger plugins all enable except kafka (no kerberos)
  • Kafka running


Validate that you have Kafka running I didn't see the below output

18/12/17 13:37:08 INFO kafka.KafkaNotification: ==> KafkaNotification()
18/12/17 13:37:08 INFO kafka.KafkaNotification: <== KafkaNotification()
18/12/17 13:37:08 INFO hook.AtlasHook: Created Atlas Hook
18/12/17 13:37:12 INFO kafka.KafkaNotification: ==>
18/12/17 13:37:12 INFO producer.ProducerConfig: ProducerConfig values:
acks = 1  
batch.size = 16384  
bootstrap.servers = []

Thanks @Geoffrey Shelton Okot for researching on this. I have resolved this issue by following instructions given in this link: