Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can NiFi be used to pipe data from sql server CDC to Kafka?

avatar
New Contributor
 
1 ACCEPTED SOLUTION

avatar
Guru

The SQL Server Enterprise CDC feature can be directed to write data to a table in the SQL server, effectively keeping an audit log in a SQL table. This can be read using the NiFi QueryDatabaseTable processor. It effectively works a bit like a trigger writing and audit table which you can then read incrementally with NiFi. So, the easy answer is yes. You would just create a flow with QueryDatabaseTable (based on the timestamp as max column) -> PutKafka. You may need to SplitAvro between the two depending on the batch size you need to get good throughput, but that's another question 🙂

That said, there are other forms of CDC which may be more efficient, but currently require third party products. Using Passive CDC, which reads the transaction logs of the SQL Server, and therefore does not put additional query load on the processor is an option here. Tools such as Attunity Replicate use this approach and write directly to Kafka. See this webinar for an example of how these tools work and how you can use NiFi with them.

View solution in original post

2 REPLIES 2

avatar

NiFi currently doesn't support CDC from transaction logs, but you can utilize DB triggers (hitting the target table frequently to get the change records) and publish the change db records to kafaka.

avatar
Guru

The SQL Server Enterprise CDC feature can be directed to write data to a table in the SQL server, effectively keeping an audit log in a SQL table. This can be read using the NiFi QueryDatabaseTable processor. It effectively works a bit like a trigger writing and audit table which you can then read incrementally with NiFi. So, the easy answer is yes. You would just create a flow with QueryDatabaseTable (based on the timestamp as max column) -> PutKafka. You may need to SplitAvro between the two depending on the batch size you need to get good throughput, but that's another question 🙂

That said, there are other forms of CDC which may be more efficient, but currently require third party products. Using Passive CDC, which reads the transaction logs of the SQL Server, and therefore does not put additional query load on the processor is an option here. Tools such as Attunity Replicate use this approach and write directly to Kafka. See this webinar for an example of how these tools work and how you can use NiFi with them.