Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Incremental data loads from relational database table with no maximum value columns

avatar
Expert Contributor

Hi All,

We've some relational tables where there are no audit columns or no max value columns, the way we're figuring out changes in the table is to compare the entire table in source with the corresponding relational table in target and using checksum determine which rows in source have changed or new. This is done via a traditional ETL tool that works with relational databases. If your goal is to do incremental loads from this relational table to a Hive table, can this be done in NiFi or Sqoop and how ?

Thanks for your suggestions.

1 ACCEPTED SOLUTION

avatar
Master Guru

NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.

View solution in original post

7 REPLIES 7

avatar
Master Guru

NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.

avatar
Expert Contributor

Thanks @Matt Burgess, how about Sqoop, any options in Sqoop ?

avatar
Master Guru

I'm not familiar enough with Sqoop to know if they have any options that don't involve a max-value column. As you point out, if there's no way from a row to tell if it is "new", then you have to check the whole table.

avatar
Master Mentor

You can do sqoop with where condition to specify your own logic of what to insert, look at free-frm queries but note it's limited to simple queries only https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports

avatar
Expert Contributor

thanks @Artem Ervits, but it seems comparing the source and target (by joining them in the same query, for change detection) is not doable in Sqoop's free-form query.

avatar
Master Mentor

@Raj B you can create a view on the RDBMS side to get diff, and sqoop on the view

avatar
Expert Contributor

thanks @Artem Ervits, I'll give it a shot