Created 10-17-2016 10:14 AM
I am trying to import RDBMS Oracle table to Hive using Sqoop --hive-import option.The Sqoop importing process went fine but at the end error'd out saying "Failed with exception java.util.ConcurrentModificationException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask".
When I opened Hive terminal, I could see table created in Hive database, but no records were inserted.
Below is the code:
sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \ --connect <jdbc:oracle:thin:@connectionstring:portno> \ --table tablename --username <username> -password <Password> \ --hive-import \ --hive-table <hivedb.hivetable> \ --split-by <column> \ -m 8
Do I need to set any parameters? Or Hive Internal tables will have such issues.
Created 01-28-2017 01:09 AM
Typically in this type of problem,the approach for solution will be as follows:
1)Check the data node log where sqoop is running after executing your sqoop command.If you are not finding log after sqoop command execution you can redirect your log to a file as follows:
# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table t1 --hive-import --direct --hive-table t1 2>&1| tee -a log
2)Control the parallelism in your sqoop command as per your need.Better to use one reducer only.
3)Finally, you can check your hive config file and disable move task parallelism by setting "hive.mv.files.thread=0"
Thanks,
Surjya Sahoo
Created 10-17-2016 12:43 PM
Do you have the full stack trace? Also, what version of HDP/sqoop? Thanks
Created 10-21-2016 11:42 AM
I'm using HDP2.5 , sqoop 1.4.6.
full log:
$ sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect jdbc:oracle:thin:@xxxxxx.xx.xxx.xxx:1111:XXXXXXX --table tablename --username <username> -password <password> --hive-import --hive-table <hivetable> --split-by <col> -m 8 Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME --
-- 16/10/21 07:25:01 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 16/10/21 07:25:01 INFO manager.SqlManager: Using default fetchSize of 1000 16/10/21 07:25:01 INFO tool.CodeGenTool: Beginning code generation 16/10/21 07:25:03 INFO manager.OracleManager: Time zone has been set to GMT 16/10/21 07:25:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "db"."tablename" t WHERE 1=0 16/10/21 07:25:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce Note: /tmp/sqoop-<username>/compile/163383944ed0d448144da421e24c5571/tablenae.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 16/10/21 07:25:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-<username>/compile/163383944ed0d448144da421e24c5571/db.tablename.jar 16/10/21 07:25:06 INFO mapreduce.ImportJobBase: Beginning import of db.tablename 16/10/21 07:25:06 INFO manager.OracleManager: Time zone has been set to GMT 16/10/21 07:25:08 INFO impl.TimelineClientImpl: Timeline service address: http://xxxxxx.xx.xx.xxx:8188/ws/v1/timeline/ 16/10/21 07:25:08 INFO client.AHSProxy: Connecting to Application History server at xxxxxxx.xx.xxxxx.xxx/ipno:10200 16/10/21 07:25:08 WARN ipc.Client: Failed to connect to server: xxxxxxx.xx.xxxxx.xxx/ipno:8032: retries get failed -- -- at org.apache.hadoop.ipc.Client.call(Client.java:1449) at org.apache.hadoop.ipc.Client.call(Client.java:1396) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at com.sun.proxy.$Proxy23.getNewApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:221) --
--
-- http://xxxxxxx.xx.xxxxx.xxx:8088/proxy/application_1476174512012_0126/ 16/10/21 07:25:12 INFO mapreduce.Job: Running job: job_1476174512012_0126 16/10/21 07:25:18 INFO mapreduce.Job: Job job_1476174512012_0126 running in uber mode : false 16/10/21 07:25:18 INFO mapreduce.Job: map 0% reduce 0% 16/10/21 07:25:25 INFO mapreduce.Job: map 10% reduce 0% 16/10/21 07:25:26 INFO mapreduce.Job: map 70% reduce 0% 16/10/21 07:25:27 INFO mapreduce.Job: map 90% reduce 0% 16/10/21 07:25:51 INFO mapreduce.Job: map 100% reduce 0% 16/10/21 07:25:51 INFO mapreduce.Job: Job job_1476174512012_0126 completed successfully 16/10/21 07:25:51 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=1676345 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1483 HDFS: Number of bytes written=32451988 HDFS: Number of read operations=40 HDFS: Number of large read operations=0 HDFS: Number of write operations=20 Job Counters Launched map tasks=10 Other local map tasks=10 Total time spent by all maps in occupied slots (ms)=81510 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=81510 Total vcore-milliseconds taken by all map tasks=81510 Total megabyte-milliseconds taken by all map tasks=333864960 Map-Reduce Framework Map input records=116058 Map output records=116058 Input split bytes=1483 Spilled Records=0 -- GC time elapsed (ms)=769 CPU time spent (ms)=27350 Physical memory (bytes) snapshot=4567121920 Virtual memory (bytes) snapshot=56302190592 Total committed heap usage (bytes)=5829558272 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=32451988 16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Transferred 30.9486 MB in 42.8346 seconds (739.8552 KB/sec) 16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Retrieved 116058 records. 16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners -- 16/10/21 07:25:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "db"."tablename" t WHERE 1=0 16/10/21 07:25:52 WARN hive.TableDefWriter: Column col1 had to be cast to a less precise type in Hive 16/10/21 07:25:52 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties OK Time taken: 1.168 seconds Loading data to table hivedb.hivetable Failed with exception java.util.ConcurrentModificationException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Created 10-17-2016 04:42 PM
@Gayathri Reddy G What is the user you are running sqoop as ?
Created 10-18-2016 05:15 AM
@grajagopal it was individual user account
Created 10-17-2016 04:51 PM
Can you access Oracle on that port with that driver? is there a firewall between the sqoop machine and oracle? can you access hive from that machine?
A good first test is just to access hive, access oracle and make sure those aren't issues.
you can also do a simple
It could be a permissions issue.
Very possibly in HDFS
/user/hive/warehouse/yourusername
hdfs dfs -ls output
/user/hive/warehouse
check those and your currently logged in user. if you are admin or root you may not have HDFS write permissions
may need to do
sudo hdfs
hdfs dfs -chmod -R 777 /user/hive/warehouse/youruser
hdfs dfs -chown youruser /user/hive/warehouse/youruser
Created 10-21-2016 11:17 AM
Yes, I can access oracle and using sqoop I can import to HDFS directory by specifying --target-directory in sqoop import. I can access hive too, I created a db, table.
in our cluster hive warehouse dir is: /apps/hive/warehouse. why will username comes into warehouse directory. I can't see any userid's under warehouse directory.
Created 10-18-2016 02:40 PM
Please check if you have that user directory created on hdfs (/user/<user>) and make sure its owned by that user. Also, try a insert overwrite into that table to test the access.
Created 10-21-2016 11:03 AM
Yes, I do have access to that table. I tried "insert overwrite table <managed_table> select * from ext_table;". This has worked. But I also tried, loading data from HDFS path(same path pointed to ext_table in prev query) to managed_table, but failed with the same error.
Created 01-28-2017 01:09 AM
Typically in this type of problem,the approach for solution will be as follows:
1)Check the data node log where sqoop is running after executing your sqoop command.If you are not finding log after sqoop command execution you can redirect your log to a file as follows:
# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table t1 --hive-import --direct --hive-table t1 2>&1| tee -a log
2)Control the parallelism in your sqoop command as per your need.Better to use one reducer only.
3)Finally, you can check your hive config file and disable move task parallelism by setting "hive.mv.files.thread=0"
Thanks,
Surjya Sahoo