I wondered if someone could point me in the right direction.
I've imported some data (4 rows) into HDFS via Sqoop using the command below;
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --target-dir "/user/maria_dev/data/SQLImport"
This worked correctly and gave me 5 files;
Given there are 4 rows in my table and 4 part files I assume each is a row plus the success file.
Where I need some help is with;
1. Understanding what these are? Are they AVRO files?
2. How can I create a Hive 'table' over the top of these, like I would by using the upload button in Ambari?
Making accessible to Hive Querying?
Any help or pointers would be massively appreciated.
@Nic Hopper You can directly import table to hive, with --hive-import
sqoop import --connect "jdbc:sqlserver://ipaddress:port;database=dbname;user=username;password=userpassword" --table policy --warehouse-dir "/user/maria_dev/data/SQLImport" --hive-import --hive-overwrite
It creates the hive table and writes data into it(generally managed table finally moves data to hive.warehouse.dir)
View solution in original post
Please check the following, these could give you some pointers:
like @icocio points out you can simply use sqoop to fetch the data and write to a Hive table directly.
Thank you all for the responses. Works as expected. I do have another question though, more advice than anything.
So I can now import data from SQL Server to Hive but if I want to apply business logic to my data how do you think I'm best doing this. Only something simple for now I think.
Shall I do it;
1. In the import, so query the data rather than a table in my import and have some logic there.
2. Import it to HIVE as I have done and then do something there.
3. Do something else.
Any pointers would be appreciated.