Support Questions

Find answers, ask questions, and share your expertise

Atlas Sqoop lineage with Hive is not working

avatar
Contributor

Hi Team,

We are using HDP-2.6.5. Using given doc we are configuring Sqoop and Hive lineage: https://hortonworks.com/tutorial/cross-component-lineage-with-apache-atlas-across-apache-sqoop-hive-...

While running sqoop import, we are getting below ClassNotFoundException :

sqoop import --connect jdbc:mysql://vc-hdp-db001a.hdp.test.com/test --table test_table_sqoop1 --hive-import --hive-table test_hive_table4 --username root -P -m 1 --fetch-size 1
Warning: /usr/hdp/2.6.5.0-292/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/12/17 05:50:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
Enter password:
18/12/17 05:50:28 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/12/17 05:50:28 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/12/17 05:50:28 INFO manager.MySQLManager: Argument '--fetch-size 1' will probably get ignored by MySQL JDBC driver.
18/12/17 05:50:28 INFO tool.CodeGenTool: Beginning code generation
18/12/17 05:50:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_table_sqoop1` AS t LIMIT 1
18/12/17 05:50:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_table_sqoop1` AS t LIMIT 1
18/12/17 05:50:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.5.0-292/hadoop-mapreduce
Note: /tmp/sqoop-hdfs/compile/90ee7535be590b2e48c64709e9c0127d/test_table_sqoop1.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/12/17 05:50:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/90ee7535be590b2e48c64709e9c0127d/test_table_sqoop1.jar
18/12/17 05:50:29 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/12/17 05:50:29 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/12/17 05:50:29 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/12/17 05:50:29 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/12/17 05:50:29 INFO mapreduce.ImportJobBase: Beginning import of test_table_sqoop1
18/12/17 05:50:30 INFO client.AHSProxy: Connecting to Application History server at p-hdp-m-r08-02.hdp.test.com/10.10.33.22:10200
18/12/17 05:50:30 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
18/12/17 05:50:30 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
18/12/17 05:50:31 INFO db.DBInputFormat: Using read commited transaction isolation
18/12/17 05:50:31 INFO mapreduce.JobSubmitter: number of splits:1
18/12/17 05:50:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1544603908449_0008
18/12/17 05:50:32 INFO impl.YarnClientImpl: Submitted application application_1544603908449_0008
18/12/17 05:50:32 INFO mapreduce.Job: The url to track the job: http://p-hdp-m-r09-01.hdp.test.com:8088/proxy/application_1544603908449_0008/
18/12/17 05:50:32 INFO mapreduce.Job: Running job: job_1544603908449_0008
18/12/17 05:50:40 INFO mapreduce.Job: Job job_1544603908449_0008 running in uber mode : false
18/12/17 05:50:40 INFO mapreduce.Job:  map 0% reduce 0%
18/12/17 05:50:48 INFO mapreduce.Job:  map 100% reduce 0%
18/12/17 05:50:48 INFO mapreduce.Job: Job job_1544603908449_0008 completed successfully
18/12/17 05:50:48 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=172085
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=172
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=6151
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=6151
                Total vcore-milliseconds taken by all map tasks=6151
                Total megabyte-milliseconds taken by all map tasks=25194496
        Map-Reduce Framework
                Map input records=6
                Map output records=6
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=68
                CPU time spent (ms)=1220
                Physical memory (bytes) snapshot=392228864
                Virtual memory (bytes) snapshot=6079295488
                Total committed heap usage (bytes)=610795520
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=172
18/12/17 05:50:48 INFO mapreduce.ImportJobBase: Transferred 172 bytes in 18.2966 seconds (9.4006 bytes/sec)
18/12/17 05:50:48 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/12/17 05:50:48 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
18/12/17 05:50:48 WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:284)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
18/12/17 05:50:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test_table_sqoop1` AS t LIMIT 1
18/12/17 05:50:48 INFO hive.HiveImport: Loading uploaded data into Hive




Logging initialized using configuration in jar:file:/usr/hdp/2.6.5.0-292/hive/lib/hive-common-1.2.1000.2.6.5.0-292.jar!/hive-log4j.properties
OK
Time taken: 4.355 seconds
Loading data to table default.test_hive_table4
Table default.test_hive_table4 stats: [numFiles=1, numRows=0, totalSize=172, rawDataSize=0]
OK
Time taken: 3.085 seconds




How to resolve it?

Please suggest. Thanks in advance.

Thanks,

Bhushan

1 ACCEPTED SOLUTION

avatar
Contributor

Thanks @Geoffrey Shelton Okot for researching on this. I have resolved this issue by following instructions given in this link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/config...

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Bhushan Kandalkar

I have just validated the process and it works, especially the sqoop import please see attached pdf. I suspect you don't have kafka installed if yes it isn't started

  • HDP 2.6.5.0-292
  • Ranger plugins all enable except kafka (no kerberos)
  • Kafka running

96384-ranger-plugins.jpg

Validate that you have Kafka running I didn't see the below output

18/12/17 13:37:08 INFO kafka.KafkaNotification: ==> KafkaNotification()
18/12/17 13:37:08 INFO kafka.KafkaNotification: <== KafkaNotification()
18/12/17 13:37:08 INFO hook.AtlasHook: Created Atlas Hook
18/12/17 13:37:12 INFO kafka.KafkaNotification: ==>
KafkaNotification.createProducer()
18/12/17 13:37:12 INFO producer.ProducerConfig: ProducerConfig values:
acks = 1  
batch.size = 16384  
bootstrap.servers = [nanyuki.kenya.ke:6667]

Hope that helps, please revert


test-lineage2.jpg

avatar
Contributor

Thanks @Geoffrey Shelton Okot for researching on this. I have resolved this issue by following instructions given in this link: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/config...