Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Limiting Sqoop imports to some time intervals

avatar
Rising Star

Hi,

we are trying to setup ingest of data from databases into Hive using Sqoop. Problem is that databases are used in production and we cannot use them too heavily during some working hours. There are many tables and some of them are quite huge ( > 2 GRows) so it is probable that we cannot ingest them all during the time window available. It is difficult to create some general delta queries that would run some given amount of time and no longer.

I am thinking about possibility to implement such feature directly into Sqoop. I am not very familiar with Sqoop implementation, but I guess there is some loop where a row gets loaded from JDBC resultset, converted and stored into Hive table.

All that would be required is to place a check in this loop and wait/sleep for some time if this is happening during the database working hours. This way the ingest will run at the full speed outside of the working hours and will be significantly reduced (but still running) when database is not supposed to be overloaded.

What do you think? Does this sound like a feature that could be useful to someone else as well?

Thanks,

Pavel

1 ACCEPTED SOLUTION

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
@Pavel Benes

This is an interesting idea. Typically there are various ways to ensure that large quantity of data is ingested successfully, like running frequent smaller updates, running more mappers during sqoop job, increasing infrastructure (obviously last resort).

I am not able to see how we reliably pull the data by setting a timer based stop on the process. This is definitely worth the discussion though.

Can you may be open a new thread (as idea and not as question) with some details around design? We should be able to brainstorm this.

avatar
Master Mentor

@Pavel Benes has this been resolved? Can you post your solution or accept best answer?