Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru

14308-cdcdataflowtop.png

The QueryDatabaseTable processor can easily ingest data from a table based on a incrementing key. A sequence id or primary key that is autogeneratored like Postgresql and MariaDB do is ideal. You can also do an incrementing data or Oracle Sequence ID. As long as it increments when you get a new one you can set. If your tables don't this, you could write a trigger or procedure in your database that sends it to a transaction table with such an autogenerated id and NiFi will grab that.

Clearly real CDC involves reading Write Ahead Logs or Transaction logs at a deep level and grabbing all changes. That is coming and can now be done by tools like Atunity + NiFi.

For use cases that I have, I just need to grab new rows when they are added to a table and I control the ID.

14309-cdcdataflowlower.png

14310-cdcrouteonattribute.png

I convert from AVRO to JSON so I can extract attributes since I want to do some routing based on column values. Based on one field in the table, I want to determine where I land the data. It can be sent to HBase (and Phoenix), HDFS or Hive.

14311-cdcsplit.png

I split my records for easy processing.

One thing you I highly recommend you do for SQL safety and to prevent errors.

Example SQL for CDC:

upsert into trials (trialid, trialdescription, fileName) values (1,'FENTANYL','5ab2d068-dd53-4674-bcf8-17f7d80d0553')

CREATE EXTERNAL TABLE IF NOT EXISTS trials2 (trialid INT, trialdescription STRING, trialtype STRING) STORED AS ORC
location '/hiveorc'

CREATE TABLE trials (trialid integer not null primary key, trialdescription varchar, filename varchar);


14315-sqlupdateattribute.png

Set your SQL Attributes for SQL Safety. The types are the numeric values for JDBC Types. 12 is String. -5 is BIG INT.

14314-sqlstring.png

Then your SQL is standard JDBC syntax with ?'s for place markers.

Here is some cool data: I used Google Location API called via NiFi REST CALL to enhance some data and get lat and long from a vague location. This kind of thing happens in Twitter all the time.

14312-tweetswithgooglelocation.png

Reference:

https://www.mockaroo.com/

https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.h...

14313-mockaroo.png

686 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 01:26 PM
Updated by:
 
Contributors
Top Kudoed Authors