Support Questions

Find answers, ask questions, and share your expertise

Sqoop Import - Now what

avatar
Contributor

Hi,

I wondered if someone could point me in the right direction.

I've imported some data (4 rows) into HDFS via Sqoop using the command below;

sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --target-dir "/user/maria_dev/data/SQLImport"

This worked correctly and gave me 5 files;

part-m-00000,

part-m-00001,

part-m-00002,

part-m-00003

and _Success

Given there are 4 rows in my table and 4 part files I assume each is a row plus the success file.

Where I need some help is with;

1. Understanding what these are? Are they AVRO files?

2. How can I create a Hive 'table' over the top of these, like I would by using the upload button in Ambari?

Making accessible to Hive Querying? Any help or pointers would be massively appreciated.

Thanks,

Nic

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Nic Hopper You can directly import table to hive, with --hive-import

sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite

It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

avatar
Rising Star

@Nic Hopper

like @icocio points out you can simply use sqoop to fetch the data and write to a Hive table directly.

avatar
Expert Contributor

@Nic Hopper You can directly import table to hive, with --hive-import

sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite

It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)

avatar
Contributor

Hi,

Thank you all for the responses. Works as expected. I do have another question though, more advice than anything.

So I can now import data from SQL Server to Hive but if I want to apply business logic to my data how do you think I'm best doing this. Only something simple for now I think.

Shall I do it;

1. In the import, so query the data rather than a table in my import and have some logic there.

2. Import it to HIVE as I have done and then do something there.

3. Do something else.

Any pointers would be appreciated.

Thanks,

Nic.