- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is maximum table limit for sqoop import
- Labels:
-
Apache Sqoop
Created ‎07-14-2018 06:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have more than 2k tables on rdbms. Is it possible to import all tables in single sqoop import-all-tables command?
Also will there be any performance issue?
Thanks
Mani
Created ‎07-15-2018 07:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mani!
This time I'll give my personal opinion, I really prefer to choose the "best fit" for each table. For example, find the best #mappers for each kinda table, if they need compression or a special where clause, transform a datatype from DB to another in Hadoop. About the performance issues, guess it's hard to have more performance using import-all than import each table separately, especially in a large number of tables like your case and even more if you're planning to run ETL over them.
Like I said above, it's just my humble opinion. Also, you can take a look at the documentation, there are some rules to use import-all:
http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
- Each table must have a single-column primary key.
- You must intend to import all columns of each table.
- You must not intend to use non-default splitting column, nor impose any conditions via a
WHERE
clause.
Another good reason to take the import table as an option is the fact that you may have some troubles during the import process of all tables at once. To debug the problem it will take longer, and if you attempt to fix a possible issue on the last tables you can get bored to wait for the whole job to finish to see if the fix worked 😄
Hope this helps!
Created ‎07-15-2018 07:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mani!
This time I'll give my personal opinion, I really prefer to choose the "best fit" for each table. For example, find the best #mappers for each kinda table, if they need compression or a special where clause, transform a datatype from DB to another in Hadoop. About the performance issues, guess it's hard to have more performance using import-all than import each table separately, especially in a large number of tables like your case and even more if you're planning to run ETL over them.
Like I said above, it's just my humble opinion. Also, you can take a look at the documentation, there are some rules to use import-all:
http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal
- Each table must have a single-column primary key.
- You must intend to import all columns of each table.
- You must not intend to use non-default splitting column, nor impose any conditions via a
WHERE
clause.
Another good reason to take the import table as an option is the fact that you may have some troubles during the import process of all tables at once. To debug the problem it will take longer, and if you attempt to fix a possible issue on the last tables you can get bored to wait for the whole job to finish to see if the fix worked 😄
Hope this helps!
Created ‎07-18-2018 04:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Thank you. It helps a lot...
---
Thanks
Mani
Created ‎07-18-2018 05:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good to know @Mani!
I'd kindly ask you, if you found the answer helpful and did answer your question completely, please accept this as an answer. This will encourage other HCC users to keep doing a good job, hence they will find the answer faster by just searching for the "best answer".
Thanks! 🙂
