Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Incremental data loads from relational database table with no maximum value columns

Rising Star

Hi All,

We've some relational tables where there are no audit columns or no max value columns, the way we're figuring out changes in the table is to compare the entire table in source with the corresponding relational table in target and using checksum determine which rows in source have changed or new. This is done via a traditional ETL tool that works with relational databases. If your goal is to do incremental loads from this relational table to a Hive table, can this be done in NiFi or Sqoop and how ?

Thanks for your suggestions.

1 ACCEPTED SOLUTION

Super Guru

NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.

View solution in original post

7 REPLIES 7

Super Guru

NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.

Rising Star

Thanks @Matt Burgess, how about Sqoop, any options in Sqoop ?

Super Guru

I'm not familiar enough with Sqoop to know if they have any options that don't involve a max-value column. As you point out, if there's no way from a row to tell if it is "new", then you have to check the whole table.

Mentor

You can do sqoop with where condition to specify your own logic of what to insert, look at free-frm queries but note it's limited to simple queries only https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports

Rising Star

thanks @Artem Ervits, but it seems comparing the source and target (by joining them in the same query, for change detection) is not doable in Sqoop's free-form query.

Mentor

@Raj B you can create a view on the RDBMS side to get diff, and sqoop on the view

Rising Star

thanks @Artem Ervits, I'll give it a shot