Support Questions

Find answers, ask questions, and share your expertise

Can sqoop be used to directly import data into an ORC table?

avatar
Guru

Right now, we use a 2 step process to import data from sqoop to ORC tables.

Step 1: Use sqoop to import raw text (in text format) into Hive tables.

Step 2: Use insert overwrite as select to write this into a hive table that is of type ORC.

Now, with this approach, we have to manually create ORC backed tables that Step 2 writes into. This also ends up with raw data in text format that we don't really need. Is there a way to directly write into hive tables as ORC format? Also, is there a way to not manually create ORC backed tables from text file backed tables?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.

Check out the documentation in Sqoop

http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive

And review 22.3 Automatic Table Creation

Example:

$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

View solution in original post

7 REPLIES 7

avatar

avatar
Guru

That topic is mostly around Change Data Capture. We use similar techniques in that usecase. My question was not related to that. Most of our cases are full data loads. We are looking to make this process easier since we have hundreds of tables. Sqoop has a good way to create table metadata which we are using. But this ends up as textfiles and we have to create another set of tables manually to write as ORC files.

avatar
Explorer

ERROR tool.ImportTool: Error during import: Import job failed!

Trying to import Oracle tables using Sqoop. Getting error during the import job failed. I have attached the logs. Any thoughts!!

mapreduce-import-failed.txt

Can you please let me know your suggestions..

Thanks,

avatar
Super Collaborator

Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.

Check out the documentation in Sqoop

http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive

And review 22.3 Automatic Table Creation

Example:

$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

avatar
Guru

Thanks @bhagan. If you can move this to answer, I will accept it as an answer. I am planning to add few of my learnings on this into an article here soon.

avatar
Contributor

Hello Bhagan,

In the above sqoop script, how does the compiler understand that the table has to be created in hive. As sqoop hcatalog does not support "Hive-import".

Please help me understand.

avatar

"--create-hcatalog-table " This tells hive to create table.