Created 03-07-2017 05:37 PM
Hi All,
We've some relational tables where there are no audit columns or no max value columns, the way we're figuring out changes in the table is to compare the entire table in source with the corresponding relational table in target and using checksum determine which rows in source have changed or new. This is done via a traditional ETL tool that works with relational databases. If your goal is to do incremental loads from this relational table to a Hive table, can this be done in NiFi or Sqoop and how ?
Thanks for your suggestions.
Created 03-07-2017 08:29 PM
NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.
Created 03-07-2017 08:29 PM
NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.
Created 03-07-2017 09:30 PM
Thanks @Matt Burgess, how about Sqoop, any options in Sqoop ?
Created 03-08-2017 02:44 PM
I'm not familiar enough with Sqoop to know if they have any options that don't involve a max-value column. As you point out, if there's no way from a row to tell if it is "new", then you have to check the whole table.
Created 03-08-2017 02:33 PM
You can do sqoop with where condition to specify your own logic of what to insert, look at free-frm queries but note it's limited to simple queries only https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports
Created 03-08-2017 03:13 PM
thanks @Artem Ervits, but it seems comparing the source and target (by joining them in the same query, for change detection) is not doable in Sqoop's free-form query.
Created 03-08-2017 03:26 PM
@Raj B you can create a view on the RDBMS side to get diff, and sqoop on the view
Created 03-09-2017 02:51 PM
thanks @Artem Ervits, I'll give it a shot