- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Sqoop Import - Now what
- Labels:
-
Apache Hive
-
Apache Sqoop
Created ‎01-30-2017 02:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I wondered if someone could point me in the right direction.
I've imported some data (4 rows) into HDFS via Sqoop using the command below;
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --target-dir "/user/maria_dev/data/SQLImport"
This worked correctly and gave me 5 files;
part-m-00000,
part-m-00001,
part-m-00002,
part-m-00003
and _Success
Given there are 4 rows in my table and 4 part files I assume each is a row plus the success file.
Where I need some help is with;
1. Understanding what these are? Are they AVRO files?
2. How can I create a Hive 'table' over the top of these, like I would by using the upload button in Ambari?
Making accessible to Hive Querying? Any help or pointers would be massively appreciated.
Thanks,
Nic
Created ‎02-01-2017 09:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nic Hopper You can directly import table to hive, with --hive-import
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite
It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)
Created ‎01-30-2017 02:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please check the following, these could give you some pointers:
https://community.hortonworks.com/articles/17469/creating-hive-partitioned-tables-using-sqoop.html
Created ‎01-30-2017 05:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
like @icocio points out you can simply use sqoop to fetch the data and write to a Hive table directly.
Created ‎02-01-2017 09:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nic Hopper You can directly import table to hive, with --hive-import
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite
It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)
Created ‎02-01-2017 02:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you all for the responses. Works as expected. I do have another question though, more advice than anything.
So I can now import data from SQL Server to Hive but if I want to apply business logic to my data how do you think I'm best doing this. Only something simple for now I think.
Shall I do it;
1. In the import, so query the data rather than a table in my import and have some logic there.
2. Import it to HIVE as I have done and then do something there.
3. Do something else.
Any pointers would be appreciated.
Thanks,
Nic.
