question Re: Sqoop Hive Import failing in Archives of Support Questions (Read Only)

Sqoop Hive Import failing

reddy_gayathri — Mon, 17 Oct 2016 17:14:47 GMT

I am trying to import RDBMS Oracle table to Hive using Sqoop --hive-import option.The Sqoop importing process went fine but at the end error'd out saying "Failed with exception java.util.ConcurrentModificationException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask".

When I opened Hive terminal, I could see table created in Hive database, but no records were inserted.

Below is the code:

sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \ --connect <jdbc:oracle:thin:@connectionstring:portno> \ --table tablename --username <username> -password <Password> \ --hive-import \ --hive-table <hivedb.hivetable> \ --split-by <column> \ -m 8

Do I need to set any parameters? Or Hive Internal tables will have such issues.

Re: Sqoop Hive Import failing

clukasik — Mon, 17 Oct 2016 19:43:38 GMT

Do you have the full stack trace? Also, what version of HDP/sqoop? Thanks

Re: Sqoop Hive Import failing

grajagopal — Mon, 17 Oct 2016 23:42:02 GMT

@Gayathri Reddy G What is the user you are running sqoop as ?

Re: Sqoop Hive Import failing

TimothySpann — Mon, 17 Oct 2016 23:51:38 GMT

Can you access Oracle on that port with that driver? is there a firewall between the sqoop machine and oracle? can you access hive from that machine?

A good first test is just to access hive, access oracle and make sure those aren't issues.

you can also do a simple

http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_how_the_standard_oracle_manager_works_for_imports

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/using_sqoop_to_move_data_into_hive.html

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_dataintegration/content/ch_using-sqoop.html

It could be a permissions issue.

Very possibly in HDFS

/user/hive/warehouse/yourusername

hdfs dfs -ls output

/user/hive/warehouse

check those and your currently logged in user. if you are admin or root you may not have HDFS write permissions

may need to do

sudo hdfs

hdfs dfs -chmod -R 777 /user/hive/warehouse/youruser

hdfs dfs -chown youruser /user/hive/warehouse/youruser

Re: Sqoop Hive Import failing

reddy_gayathri — Tue, 18 Oct 2016 12:15:28 GMT

@grajagopal it was individual user account

Re: Sqoop Hive Import failing

grajagopal — Tue, 18 Oct 2016 21:40:14 GMT

@Gayathri Reddy G

Please check if you have that user directory created on hdfs (/user/<user>) and make sure its owned by that user. Also, try a insert overwrite into that table to test the access.

Re: Sqoop Hive Import failing

reddy_gayathri — Fri, 21 Oct 2016 18:03:14 GMT

Yes, I do have access to that table. I tried "insert overwrite table <managed_table> select * from ext_table;". This has worked. But I also tried, loading data from HDFS path(same path pointed to ext_table in prev query) to managed_table, but failed with the same error.

Re: Sqoop Hive Import failing

reddy_gayathri — Fri, 21 Oct 2016 18:17:14 GMT

Yes, I can access oracle and using sqoop I can import to HDFS directory by specifying --target-directory in sqoop import. I can access hive too, I created a db, table.

in our cluster hive warehouse dir is: /apps/hive/warehouse. why will username comes into warehouse directory. I can't see any userid's under warehouse directory.

Re: Sqoop Hive Import failing

reddy_gayathri — Fri, 21 Oct 2016 18:42:57 GMT

I'm using HDP2.5 , sqoop 1.4.6.

full log:

$ sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect jdbc:oracle:thin:@xxxxxx.xx.xxx.xxx:1111:XXXXXXX --table tablename --username <username> -password <password> --hive-import --hive-table <hivetable> --split-by <col> -m 8 Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME --

-- 16/10/21 07:25:01 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 16/10/21 07:25:01 INFO manager.SqlManager: Using default fetchSize of 1000 16/10/21 07:25:01 INFO tool.CodeGenTool: Beginning code generation 16/10/21 07:25:03 INFO manager.OracleManager: Time zone has been set to GMT 16/10/21 07:25:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "db"."tablename" t WHERE 1=0 16/10/21 07:25:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce Note: /tmp/sqoop-<username>/compile/163383944ed0d448144da421e24c5571/tablenae.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/10/21 07:25:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-<username>/compile/163383944ed0d448144da421e24c5571/db.tablename.jar 16/10/21 07:25:06 INFO mapreduce.ImportJobBase: Beginning import of db.tablename 16/10/21 07:25:06 INFO manager.OracleManager: Time zone has been set to GMT 16/10/21 07:25:08 INFO impl.TimelineClientImpl: Timeline service address: http://xxxxxx.xx.xx.xxx:8188/ws/v1/timeline/ 16/10/21 07:25:08 INFO client.AHSProxy: Connecting to Application History server at xxxxxxx.xx.xxxxx.xxx/ipno:10200 16/10/21 07:25:08 WARN ipc.Client: Failed to connect to server: xxxxxxx.xx.xxxxx.xxx/ipno:8032: retries get failed -- -- at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy23.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:221) --

-- http://xxxxxxx.xx.xxxxx.xxx:8088/proxy/application_1476174512012_0126/ 16/10/21 07:25:12 INFO mapreduce.Job: Running job: job_1476174512012_0126 16/10/21 07:25:18 INFO mapreduce.Job: Job job_1476174512012_0126 running in uber mode : false 16/10/21 07:25:18 INFO mapreduce.Job: map 0% reduce 0% 16/10/21 07:25:25 INFO mapreduce.Job: map 10% reduce 0% 16/10/21 07:25:26 INFO mapreduce.Job: map 70% reduce 0% 16/10/21 07:25:27 INFO mapreduce.Job: map 90% reduce 0% 16/10/21 07:25:51 INFO mapreduce.Job: map 100% reduce 0% 16/10/21 07:25:51 INFO mapreduce.Job: Job job_1476174512012_0126 completed successfully 16/10/21 07:25:51 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=1676345 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1483 HDFS: Number of bytes written=32451988 HDFS: Number of read operations=40 HDFS: Number of large read operations=0 HDFS: Number of write operations=20 Job Counters Launched map tasks=10 Other local map tasks=10 Total time spent by all maps in occupied slots (ms)=81510 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=81510 Total vcore-milliseconds taken by all map tasks=81510 Total megabyte-milliseconds taken by all map tasks=333864960 Map-Reduce Framework Map input records=116058 Map output records=116058 Input split bytes=1483 Spilled Records=0 -- GC time elapsed (ms)=769 CPU time spent (ms)=27350 Physical memory (bytes) snapshot=4567121920 Virtual memory (bytes) snapshot=56302190592 Total committed heap usage (bytes)=5829558272 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=32451988 16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Transferred 30.9486 MB in 42.8346 seconds (739.8552 KB/sec) 16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Retrieved 116058 records. 16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners -- 16/10/21 07:25:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "db"."tablename" t WHERE 1=0 16/10/21 07:25:52 WARN hive.TableDefWriter: Column col1 had to be cast to a less precise type in Hive 16/10/21 07:25:52 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties OK Time taken: 1.168 seconds Loading data to table hivedb.hivetable Failed with exception java.util.ConcurrentModificationException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Re: Sqoop Hive Import failing

sahoo_surjya — Sat, 28 Jan 2017 09:09:23 GMT

Typically in this type of problem,the approach for solution will be as follows:

1)Check the data node log where sqoop is running after executing your sqoop command.If you are not finding log after sqoop command execution you can redirect your log to a file as follows:

# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table t1 --hive-import --direct --hive-table t1 2>&1| tee -a log

2)Control the parallelism in your sqoop command as per your need.Better to use one reducer only.

3)Finally, you can check your hive config file and disable move task parallelism by setting "hive.mv.files.thread=0"

Thanks,

Surjya Sahoo

Re: Sqoop Hive Import failing

vranganathan — Sun, 29 Jan 2017 08:08:37 GMT

This is a problem from the hive move task (which has since been fixed in HIVE-15355) which is called by Sqoop after the import into HDFS. So, disabling move task parallelism is the right solution by adding the configuration parameter hive.mv.files.thread=0. That said, I would suggest using --hcatalog-table option with import which allows for

1. better data fidelity

2. remove one intermediate step of landing on HDFS and then invoking the hive client to do the import