Created on 01-31-2017 06:18 AM - edited 09-16-2022 03:59 AM
Dear Cloudera Community,
I am new to this community and like to know how to do bulk load from Oracle database one time and daily incremental on ImpalaDB with Sqoop or any other tools without any use of Hive-ignoring HiveDB completely? I assume Impala is complete replacement for Hive in certain situation.
Regards
Anis
Created 01-31-2017 12:24 PM
1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version:
https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html
2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link
3. About Incremental: Pls refer "incremental import" from the above link
4. About Impala for Sqoop:
a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script
b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables)
Thanks
Kumar
Created 01-31-2017 12:24 PM
1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version:
https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html
2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link
3. About Incremental: Pls refer "incremental import" from the above link
4. About Impala for Sqoop:
a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script
b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables)
Thanks
Kumar