Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi for MS SQL CDC using dynamic sql query.

Highlighted

Nifi for MS SQL CDC using dynamic sql query.

New Contributor

Hello,

In our legacy architecture, we have a MS SQL server database, this database stores all the sensor information in near real time basis, on average per second it receives 100 records.In order to get complete information about the sensor events, we need to join 2 to 3 tables in the database.

Sample Query:

SELECT SOMETHING FROM TABLE1 AS tab1 INNER JOIN TABLE2 AS tab2 ON tab1.UpdateID=tab2.ID INNER JOIN TABLE3 as tab3 ON tab1.TagID=tab3.ID WHERE tab2.UpdateTime > ${lastExtractUnixTime}

Our requirement is to get the change data capture of above query every 1 minute and post records to Kafka.

Temporarily I am doing CDC using Spark Core JDBC, processing records, sending to Kafka and maintaining CDC information along with ${lastExtractUnixTime} into HBase as Phoenix table. Job is scheduled for every 1 minute batch interval.

As a long term solution, we are planning to use Apache Nifi for doing the CDC thing and post information to Kafka, Spark streaming will read messages from Kafka, will apply some business logic on it and will send the enriched data to the other Kafka topic; I don't find suitable processor, which will help me to dynamically pass the ${lastExtractUnixTime} in SQL and get the delta records every 1 or 2 minutes.

Please suggest how this can be accomplished using Apache Nifi.

Don't have an account?
Coming from Hortonworks? Activate your account here