Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop Hive Import failing

avatar
Rising Star

I am trying to import RDBMS Oracle table to Hive using Sqoop --hive-import option.The Sqoop importing process went fine but at the end error'd out saying "Failed with exception java.util.ConcurrentModificationException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask".

When I opened Hive terminal, I could see table created in Hive database, but no records were inserted.

Below is the code:

sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \ --connect <jdbc:oracle:thin:@connectionstring:portno> \ --table tablename --username <username> -password <Password> \ --hive-import \ --hive-table <hivedb.hivetable> \ --split-by <column> \ -m 8

Do I need to set any parameters? Or Hive Internal tables will have such issues.

1 ACCEPTED SOLUTION

avatar
New Contributor

Typically in this type of problem,the approach for solution will be as follows:

1)Check the data node log where sqoop is running after executing your sqoop command.If you are not finding log after sqoop command execution you can redirect your log to a file as follows:

# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table t1 --hive-import --direct --hive-table t1 2>&1| tee -a log

2)Control the parallelism in your sqoop command as per your need.Better to use one reducer only.

3)Finally, you can check your hive config file and disable move task parallelism by setting "hive.mv.files.thread=0"

Thanks,

Surjya Sahoo

View solution in original post

10 REPLIES 10

avatar
Expert Contributor

This is a problem from the hive move task (which has since been fixed in HIVE-15355) which is called by Sqoop after the import into HDFS. So, disabling move task parallelism is the right solution by adding the configuration parameter hive.mv.files.thread=0. That said, I would suggest using --hcatalog-table option with import which allows for

1. better data fidelity

2. remove one intermediate step of landing on HDFS and then invoking the hive client to do the import