- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Incremental data loads from relational database table with no maximum value columns
- Labels:
-
Apache NiFi
-
Apache Sqoop
Created ‎03-07-2017 05:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
We've some relational tables where there are no audit columns or no max value columns, the way we're figuring out changes in the table is to compare the entire table in source with the corresponding relational table in target and using checksum determine which rows in source have changed or new. This is done via a traditional ETL tool that works with relational databases. If your goal is to do incremental loads from this relational table to a Hive table, can this be done in NiFi or Sqoop and how ?
Thanks for your suggestions.
Created ‎03-07-2017 08:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.
Created ‎03-07-2017 08:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
NIFI-3413 describes a GetChangeDataCapture processor, which will generate events that correspond to database updates (rows added, tables changed, e.g.). The hope is that it will support MySQL, Oracle, and SQL Server (although the first release of the processor may not). Development of the MySQL CDC support (via its binlog files) is underway.
Created ‎03-07-2017 09:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Matt Burgess, how about Sqoop, any options in Sqoop ?
Created ‎03-08-2017 02:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not familiar enough with Sqoop to know if they have any options that don't involve a max-value column. As you point out, if there's no way from a row to tell if it is "new", then you have to check the whole table.
Created ‎03-08-2017 02:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can do sqoop with where condition to specify your own logic of what to insert, look at free-frm queries but note it's limited to simple queries only https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports
Created ‎03-08-2017 03:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks @Artem Ervits, but it seems comparing the source and target (by joining them in the same query, for change detection) is not doable in Sqoop's free-form query.
Created ‎03-08-2017 03:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Raj B you can create a view on the RDBMS side to get diff, and sqoop on the view
Created ‎03-09-2017 02:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks @Artem Ervits, I'll give it a shot
