Support Questions

Find answers, ask questions, and share your expertise

Sqoop Doubt

avatar
Explorer

Hi ,

 

 

Suppose if We have a data in MySQL db in STUDENT table ,  we have imported the data in HDFS. there is no predefined Hive table. Data is tab delimited text files.How the data can be make available to Hive Table.(Assume that not delimited ).

 

Please help me ...Iam confused with sqoop commands, or is it possible to do with creat external table using hive -e" create external table....", or do we have to use sqoop hive import commnad??

 

 

Waiting for the reply..

Please help me

1 ACCEPTED SOLUTION

avatar
Explorer
Okkk thanks Harish... 🙂

View solution in original post

10 REPLIES 10

avatar
Mentor
You have a couple of options at least:

0. Import from Sqoop directly into the required file format and/or table,
instead of just delimited text. Sqoop supports Text, Sequence, Avro and
Parquet formats.

2. Import into a new (temporary) table thats created with the delimited
format specifiers. Load the data into this table via LOAD DATA LOCAL INPATH
statement, then use INSERT INTO statement to move the data into the
original table that uses a different file format.

avatar
Explorer
Hi Harish,

Thanks for the reply & your valuable time spent for my question..

Actually what I got confused is ..if the data is already in HDFS, then what is the need of sqoop?? We can create external table & make that data available .right?? Please correct me if i am wrong.

avatar
Mentor
Sqoop should be used if you have to import or export data from/to RDBMSs
like MySQL/Oracle/etc.; It will allow you to grab data from there and load
it into HDFS.

If you already have your data in HDFS, then Sqoop may not be what you are
looking for (unless you want to write data back to RDBMSs).

avatar
Explorer
Hi Harish,

So right answer will be the creating external table right?


Thanks
Priyap

avatar
Mentor
That sounds about right!

avatar
Explorer
Ok Thanks Harish for your valuable help.

I have few more doubts.

I have been searching for the difference between 2 sqoop commnads, Please anyone tell me when we will use each commands, what is the significance of them , Under what situation we will use?


1) sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

2) sqoop import --connect <JDBC connection string> --table <tablename> --username <username> --password <password> --hive-import


What is the significance of using --create-hive-table in 1st command. Even if its not there it will import the data in to hive using"--hive-import"coamnd?. Then what is the use of that? In second command we are doing the same, but no --create-hive-table. why?.

Pelase help me..


Thanks a lot..

avatar
Mentor
Checkout the Sqoop User Guide documentation at
http://archive.cloudera.com/cdh5/cdh/5/sqoop/SqoopUserGuide.html:

| --create-hive-table | If set, then the job will fail if the target hive
table exits. By default this property is false. |

avatar
Explorer
Thanks Harish.

I have seen the same in the site..

Its telling it is used to create hive table, if there is existing table job will fail.Right?

So my doubt is if we don't use --create-hove-table in the 1st command, it should create a hive table right using "--hive-import"? So what is the significance of using --create-hive-table command??

Please correct if I am wrong.


Thanks for your help Harish..

avatar
Mentor
The option is useful to ensure/assert you don't already have an existing
table and the data should not be appended into an existing table.

This is required in some workflows that load staging tables, for example