- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Can sqoop be used to directly import data into an ORC table?
- Labels:
-
Apache Sqoop
Created 04-18-2016 03:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Right now, we use a 2 step process to import data from sqoop to ORC tables.
Step 1: Use sqoop to import raw text (in text format) into Hive tables.
Step 2: Use insert overwrite as select to write this into a hive table that is of type ORC.
Now, with this approach, we have to manually create ORC backed tables that Step 2 writes into. This also ends up with raw data in text format that we don't really need. Is there a way to directly write into hive tables as ORC format? Also, is there a way to not manually create ORC backed tables from text file backed tables?
Created 04-18-2016 07:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.
Check out the documentation in Sqoop
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive
And review 22.3 Automatic Table Creation
Example:
$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
Created 04-18-2016 03:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, check out this doc page, for example: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_dataintegration/content/incrementally-up...
Created 04-18-2016 04:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That topic is mostly around Change Data Capture. We use similar techniques in that usecase. My question was not related to that. Most of our cases are full data loads. We are looking to make this process easier since we have hundreds of tables. Sqoop has a good way to create table metadata which we are using. But this ends up as textfiles and we have to create another set of tables manually to write as ORC files.
Created 04-26-2016 05:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ERROR tool.ImportTool: Error during import: Import job failed!
Trying to import Oracle tables using Sqoop. Getting error during the import job failed. I have attached the logs. Any thoughts!!
Can you please let me know your suggestions..
Thanks,
Created 04-18-2016 07:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.
Check out the documentation in Sqoop
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive
And review 22.3 Automatic Table Creation
Example:
$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"
Created 04-25-2016 09:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @bhagan. If you can move this to answer, I will accept it as an answer. I am planning to add few of my learnings on this into an article here soon.
Created 05-26-2017 06:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Bhagan,
In the above sqoop script, how does the compiler understand that the table has to be created in hive. As sqoop hcatalog does not support "Hive-import".
Please help me understand.
Created 06-06-2017 04:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"--create-hcatalog-table " This tells hive to create table.
