Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop import & hive ORC

avatar
Contributor

All,

I have question for sqooping , I am sqooping around 2tb of data for one table and then need to write ORC table wit h that . What's best way to achieve

1) sqoop all data in dir1 as text and write HQL to load into ORC table , where script fail for vertex issue

2) sqoop data in chucks and process and append into hive table ( have you done this ? )

3) sqoop hive import to write all data to hive ORC table

Which is best way ?

1 ACCEPTED SOLUTION

avatar

If the table has primary keys through which you can identify unique records then make use of those keys to get chunks of data and load it into hive. Sqoop will always works good with bulk import. But when the data is too huge its not recommended to import in one shot. Its also depends upon your source RDBMS as well. I have encountered the same issue where I am able to import a table which is 20TB from teradata into hive which works perfectly fine. But when the table size increases to 30Tb im unable to import in one single stretch. In such cases I will go with multiple chucks and or import the table by using primary keys as split by and increase the mapper size it should also hold good for your scenario.

View solution in original post

1 REPLY 1

avatar

If the table has primary keys through which you can identify unique records then make use of those keys to get chunks of data and load it into hive. Sqoop will always works good with bulk import. But when the data is too huge its not recommended to import in one shot. Its also depends upon your source RDBMS as well. I have encountered the same issue where I am able to import a table which is 20TB from teradata into hive which works perfectly fine. But when the table size increases to 30Tb im unable to import in one single stretch. In such cases I will go with multiple chucks and or import the table by using primary keys as split by and increase the mapper size it should also hold good for your scenario.