Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using Sqoop API to import DB to Hive


Using Sqoop API to import DB to Hive

New Contributor

I have the following sqoop command that works on my environment. The remote table gets imported to a hadood table and I get a AVSC file:

sqoop import --table users  \--connect jdbc:mysql://myip:3306/rec_db \
--username=root \--password=mypwd \--compression-codec=snappy \--as-avrodatafile \--warehouse-dir=/user/hive/warehouse 

Now I am trying to convert that to Java code using sqoop API. This is the code so far:

public class SqoopPoC1 {

        public static void main(String[] args) {
            SqoopOptions options = new SqoopOptions();            options.setConnectString("jdbc:mysql://myip:3306/rec_db"); 
            options.setTableName("users");            options.setUsername("root");            options.setPassword("mypwd");
            //options.setCompressionCodec("snappy");            options.setNumMappers(1);         // Default value is 4
            int ret = new ImportTool().run(options);


This copies the remote table to a regular file (not inside hadoop). I have tried specifying setTargetDir and setWareHouse but everything has failed so far. I get errors like:

2015-08-27 20:25:54,738 WARN  [Thread-19] mapred.LocalJobRunner ( - job_local2044679838_0001
java.lang.Exception: Mkdirs failed to create file:/user/hive/warehouse/_temporary/0/_temporary/attempt_local2044679838_0001_m_000000_0 (exists=false, cwd=file:/home/cloudera)    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(    at org.apache.hadoop.mapred.LocalJobRunner$

or the one in this other SO question.


I am using Sqoop1 and one of the methods is deprecated, but as far as I have seen the Sqoop2 API is not so flexible (read it on the answer to other questions on SO, just can´t find where).

I would appreciate advise on how to best proceed to use Sqoop and Hive programmatically.

I have seen on SO too, that it is recommended to execute based on passing a string to a shell-like method as in this other old question, I am hoping for things to have improved since. Also found this similar question


My environment is the Cloudera QuickStart 5.4.2. Sqoop1 according to the Jar files is version 1.4.5

Independently I have also tried to use Sqoop2 using the code here but the parameters in String "linkConfig.xxxxx" seem not to be standard and the program complaints those keys are not defined. I have added a comment to an existing SO question.


Would Kite help with my targets?

What is the recommended way to go?


Any help is much appreciated.