Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to bulk upload Impala -Direct table[Oracle] to Table[Impala] upload

avatar

Dear Cloudera Community,

 

I am new to this community and like to know how to do bulk load from Oracle database one time and daily incremental on ImpalaDB with Sqoop or any other tools  without any use of Hive-ignoring HiveDB completely? I assume Impala is complete replacement for Hive in certain situation.

 

Regards

Anis

1 ACCEPTED SOLUTION

avatar
Champion

@AnisurRehman

 

1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version:

https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html

2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link
3. About Incremental: Pls refer "incremental import" from the above link
4. About Impala for Sqoop:
a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script
b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables)

 

Thanks

Kumar

View solution in original post

1 REPLY 1

avatar
Champion

@AnisurRehman

 

1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version:

https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html

2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link
3. About Incremental: Pls refer "incremental import" from the above link
4. About Impala for Sqoop:
a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script
b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables)

 

Thanks

Kumar