Created 10-17-2016 10:14 AM
I am trying to import RDBMS Oracle table to Hive using Sqoop --hive-import option.The Sqoop importing process went fine but at the end error'd out saying "Failed with exception java.util.ConcurrentModificationException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask".
When I opened Hive terminal, I could see table created in Hive database, but no records were inserted.
Below is the code:
sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \ --connect <jdbc:oracle:thin:@connectionstring:portno> \ --table tablename --username <username> -password <Password> \ --hive-import \ --hive-table <hivedb.hivetable> \ --split-by <column> \ -m 8
Do I need to set any parameters? Or Hive Internal tables will have such issues.
Created 01-28-2017 01:09 AM
Typically in this type of problem,the approach for solution will be as follows:
1)Check the data node log where sqoop is running after executing your sqoop command.If you are not finding log after sqoop command execution you can redirect your log to a file as follows:
# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table t1 --hive-import --direct --hive-table t1 2>&1| tee -a log
2)Control the parallelism in your sqoop command as per your need.Better to use one reducer only.
3)Finally, you can check your hive config file and disable move task parallelism by setting "hive.mv.files.thread=0"
Thanks,
Surjya Sahoo
Created 01-29-2017 12:08 AM
This is a problem from the hive move task (which has since been fixed in HIVE-15355) which is called by Sqoop after the import into HDFS. So, disabling move task parallelism is the right solution by adding the configuration parameter hive.mv.files.thread=0. That said, I would suggest using --hcatalog-table option with import which allows for
1. better data fidelity
2. remove one intermediate step of landing on HDFS and then invoking the hive client to do the import