Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is maximum table limit for sqoop import

avatar

I have more than 2k tables on rdbms. Is it possible to import all tables in single sqoop import-all-tables command?

Also will there be any performance issue?

Thanks

Mani

1 ACCEPTED SOLUTION

avatar

Hi Mani!
This time I'll give my personal opinion, I really prefer to choose the "best fit" for each table. For example, find the best #mappers for each kinda table, if they need compression or a special where clause, transform a datatype from DB to another in Hadoop. About the performance issues, guess it's hard to have more performance using import-all than import each table separately, especially in a large number of tables like your case and even more if you're planning to run ETL over them.

Like I said above, it's just my humble opinion. Also, you can take a look at the documentation, there are some rules to use import-all:

http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal

  • Each table must have a single-column primary key.
  • You must intend to import all columns of each table.
  • You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause.

Another good reason to take the import table as an option is the fact that you may have some troubles during the import process of all tables at once. To debug the problem it will take longer, and if you attempt to fix a possible issue on the last tables you can get bored to wait for the whole job to finish to see if the fix worked 😄
Hope this helps!

View solution in original post

3 REPLIES 3

avatar

Hi Mani!
This time I'll give my personal opinion, I really prefer to choose the "best fit" for each table. For example, find the best #mappers for each kinda table, if they need compression or a special where clause, transform a datatype from DB to another in Hadoop. About the performance issues, guess it's hard to have more performance using import-all than import each table separately, especially in a large number of tables like your case and even more if you're planning to run ETL over them.

Like I said above, it's just my humble opinion. Also, you can take a look at the documentation, there are some rules to use import-all:

http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal

  • Each table must have a single-column primary key.
  • You must intend to import all columns of each table.
  • You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause.

Another good reason to take the import table as an option is the fact that you may have some troubles during the import process of all tables at once. To debug the problem it will take longer, and if you attempt to fix a possible issue on the last tables you can get bored to wait for the whole job to finish to see if the fix worked 😄
Hope this helps!

avatar

Hi, Thank you. It helps a lot...

---

Thanks

Mani

avatar

Good to know @Mani!
I'd kindly ask you, if you found the answer helpful and did answer your question completely, please accept this as an answer. This will encourage other HCC users to keep doing a good job, hence they will find the answer faster by just searching for the "best answer".
Thanks! 🙂