Created 04-18-2016 03:04 PM
Right now, we use a 2 step process to import data from sqoop to ORC tables.
Step 1: Use sqoop to import raw text (in text format) into Hive tables.
Step 2: Use insert overwrite as select to write this into a hive table that is of type ORC.
Now, with this approach, we have to manually create ORC backed tables that Step 2 writes into. This also ends up with raw data in text format that we don't really need. Is there a way to directly write into hive tables as ORC format? Also, is there a way to not manually create ORC backed tables from text file backed tables?
Created 04-18-2016 07:07 PM
Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.
Check out the documentation in Sqoop
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive
And review 22.3 Automatic Table Creation
Created 04-18-2016 03:07 PM
Yes, check out this doc page, for example: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_dataintegration/content/incrementally-up...
Created 04-18-2016 04:07 PM
That topic is mostly around Change Data Capture. We use similar techniques in that usecase. My question was not related to that. Most of our cases are full data loads. We are looking to make this process easier since we have hundreds of tables. Sqoop has a good way to create table metadata which we are using. But this ends up as textfiles and we have to create another set of tables manually to write as ORC files.
Created 04-26-2016 05:43 PM
Trying to import Oracle tables using Sqoop. Getting error during the import job failed. I have attached the logs. Any thoughts!!
Can you please let me know your suggestions..
Thanks,
Created 04-18-2016 07:07 PM
Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.
Check out the documentation in Sqoop
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive
And review 22.3 Automatic Table Creation
Created 04-25-2016 09:58 PM
Thanks @bhagan. If you can move this to answer, I will accept it as an answer. I am planning to add few of my learnings on this into an article here soon.
Created 05-26-2017 06:17 PM
Hello Bhagan,
In the above sqoop script, how does the compiler understand that the table has to be created in hive. As sqoop hcatalog does not support "Hive-import".
Please help me understand.
Created 06-06-2017 04:26 PM