We are trying to incrementally import all our external rdbms tables into Hive using Sqoop jobs. There are more than 1500 tables in almost 30 Databases. We tested the incremental imports on the table based on our requirement and have written sqoop jobs accordingly.
We want to do it only incrementally as and when the source Table is updated or a new record is added we need to add it to our Hive tables. The only option we see for data ingestion incrementally is sqoop Job.
So, writing so many sqoop jobs and scheduling it on daily basis and maintaining it seems to be needed a lot of efforts and will take time which is practically impossible.
Since, Cloudera is targetted at Big Data there must be a solution where we can load data from a lot of different tables and ingest into the Hive.
Please help us in creating a solution to import all the source tables date incrementally and schedule it daily, so that we don't miss any data.