Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can sqoop be used to directly import data into an ORC table?

Solved Go to solution
Highlighted

Can sqoop be used to directly import data into an ORC table?

Guru

Right now, we use a 2 step process to import data from sqoop to ORC tables.

Step 1: Use sqoop to import raw text (in text format) into Hive tables.

Step 2: Use insert overwrite as select to write this into a hive table that is of type ORC.

Now, with this approach, we have to manually create ORC backed tables that Step 2 writes into. This also ends up with raw data in text format that we don't really need. Is there a way to directly write into hive tables as ORC format? Also, is there a way to not manually create ORC backed tables from text file backed tables?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

Expert Contributor

Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.

Check out the documentation in Sqoop

http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive

And review 22.3 Automatic Table Creation

Example:

$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

View solution in original post

7 REPLIES 7
Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

Guru

That topic is mostly around Change Data Capture. We use similar techniques in that usecase. My question was not related to that. Most of our cases are full data loads. We are looking to make this process easier since we have hundreds of tables. Sqoop has a good way to create table metadata which we are using. But this ends up as textfiles and we have to create another set of tables manually to write as ORC files.

Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

Explorer

ERROR tool.ImportTool: Error during import: Import job failed!

Trying to import Oracle tables using Sqoop. Getting error during the import job failed. I have attached the logs. Any thoughts!!

mapreduce-import-failed.txt

Can you please let me know your suggestions..

Thanks,

Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

Expert Contributor

Ravi, you can use Sqoop to import tables and store them directly as ORC. They key option is --hcatalog-storage-stanza.

Check out the documentation in Sqoop

http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive

And review 22.3 Automatic Table Creation

Example:

$ sqoop import --connect jdbc:mysql://localhost/employees --username hive --password hive --table departments --hcatalog-database default --hcatalog-table my_table_orc --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

View solution in original post

Re: Can sqoop be used to directly import data into an ORC table?

Guru

Thanks @bhagan. If you can move this to answer, I will accept it as an answer. I am planning to add few of my learnings on this into an article here soon.

Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

Explorer

Hello Bhagan,

In the above sqoop script, how does the compiler understand that the table has to be created in hive. As sqoop hcatalog does not support "Hive-import".

Please help me understand.

Highlighted

Re: Can sqoop be used to directly import data into an ORC table?

New Contributor

"--create-hcatalog-table " This tells hive to create table.

Don't have an account?
Coming from Hortonworks? Activate your account here