Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎08-26-2015

Using Sqoop API to import DB to Hive

I have the following sqoop command that works on my environment. The remote table gets imported to a hadood table and I get a AVSC file:

sqoop import --table users  \--connect jdbc:mysql://myip:3306/rec_db \
--username=root \--password=mypwd \--compression-codec=snappy \--as-avrodatafile \--warehouse-dir=/user/hive/warehouse 

Now I am trying to convert that to Java code using sqoop API. This is the code so far:

public class SqoopPoC1 {

        public static void main(String[] args) {
            SqoopOptions options = new SqoopOptions();            options.setConnectString("jdbc:mysql://myip:3306/rec_db"); 
            options.setTableName("users");            options.setUsername("root");            options.setPassword("mypwd");
            //options.setCompressionCodec("snappy");            options.setNumMappers(1);         // Default value is 4
            //options.setTargetDir("/user/hive/warehouse");
            //options.setWarehouseDir("/user/hive/warehouse");
            int ret = new ImportTool().run(options);
            System.out.println("Ret:"+ret);
        }

}

This copies the remote table to a regular file (not inside hadoop). I have tried specifying setTargetDir and setWareHouse but everything has failed so far. I get errors like:

2015-08-27 20:25:54,738 WARN  [Thread-19] mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local2044679838_0001
java.lang.Exception: java.io.IOException: Mkdirs failed to create file:/user/hive/warehouse/_temporary/0/_temporary/attempt_local2044679838_0001_m_000000_0 (exists=false, cwd=file:/home/cloudera)    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

or the one in this other SO question.

 

I am using Sqoop1 and one of the methods is deprecated, but as far as I have seen the Sqoop2 API is not so flexible (read it on the answer to other questions on SO, just can´t find where).

I would appreciate advise on how to best proceed to use Sqoop and Hive programmatically.

I have seen on SO too, that it is recommended to execute based on passing a string to a shell-like method as in this other old question, I am hoping for things to have improved since. Also found this similar question

 

My environment is the Cloudera QuickStart 5.4.2. Sqoop1 according to the Jar files is version 1.4.5

Independently I have also tried to use Sqoop2 using the code here but the parameters in String "linkConfig.xxxxx" seem not to be standard and the program complaints those keys are not defined. I have added a comment to an existing SO question.

 

Would Kite help with my targets?

What is the recommended way to go?

 

Any help is much appreciated.

Announcements
New solutions