Created 01-30-2017 02:30 PM
Hi,
I wondered if someone could point me in the right direction.
I've imported some data (4 rows) into HDFS via Sqoop using the command below;
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --target-dir "/user/maria_dev/data/SQLImport"
This worked correctly and gave me 5 files;
part-m-00000,
part-m-00001,
part-m-00002,
part-m-00003
and _Success
Given there are 4 rows in my table and 4 part files I assume each is a row plus the success file.
Where I need some help is with;
1. Understanding what these are? Are they AVRO files?
2. How can I create a Hive 'table' over the top of these, like I would by using the upload button in Ambari?
Making accessible to Hive Querying? Any help or pointers would be massively appreciated.
Thanks,
Nic
Created 02-01-2017 09:54 AM
@Nic Hopper You can directly import table to hive, with --hive-import
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite
It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)
Created 01-30-2017 02:44 PM
Please check the following, these could give you some pointers:
https://community.hortonworks.com/articles/17469/creating-hive-partitioned-tables-using-sqoop.html
Created 01-30-2017 05:24 PM
like @icocio points out you can simply use sqoop to fetch the data and write to a Hive table directly.
Created 02-01-2017 09:54 AM
@Nic Hopper You can directly import table to hive, with --hive-import
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite
It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)
Created 02-01-2017 02:01 PM
Hi,
Thank you all for the responses. Works as expected. I do have another question though, more advice than anything.
So I can now import data from SQL Server to Hive but if I want to apply business logic to my data how do you think I'm best doing this. Only something simple for now I think.
Shall I do it;
1. In the import, so query the data rather than a table in my import and have some logic there.
2. Import it to HIVE as I have done and then do something there.
3. Do something else.
Any pointers would be appreciated.
Thanks,
Nic.