Created on 07-18-201702:30 PM - edited 08-17-201912:00 PM
This tutorial is the final article of a three part series. We will look at the flow components in greater detail to highlight best practices and potential trouble spots when working with CDC in NiFi. Part 1Part 2
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6
Apache NiFi 1.3.0
CDC Flow Configuration Details, Best Practices & Troubleshooting
It's helpful to have completed the first two articles in this series(Part 1 ; Part 2) to have a point of reference for the items discussed below. But, this article will hopefully still be useful for those who are interested in some of the nuances of the CaptureChangeMysql, EnforceOrder and PutDatabaseRecord processors.
State in CaptureChangeMysql
The CaptureChangeMySQL processor has a "Distributed Map Cache Client" property which is set to a DistributedMapCacheClientService controller service to store State information and keep it updated. The DistributedMapCacheClientService controller service communicates with a DistributedMapCacheServer. The CaptureChangeMySql processor can be configured to monitor thousands of tables. Using this controller service to monitor the state information provides this flexibility.
If you stop the CaptureChangeMySQL processor after it has run some binary logs, highlight it and right-click, you will see the option "View State" in the context menu. Selecting that option will show you the state information that has been stored:
Note: This is the default behavior, with the State Update Interval set to 0 seconds, meaning the state is updated when the processor is stopped or shutdown.
If you are debugging CaptureChangeMysql in your flow and wondering why data is not being processed, make sure to "Clear state" so that the processor will look for events at the beginning of your bin logs again.
State information is also stored for the EnforceOrder processor.
If you are debugging EnforceOrder in your flow and wondering why data is not being processed, make sure to "Clear state" to reset.
EnforceOrder in a Clustered Environment
The EnforceOrder processor is used to guarantee the sequence of events. In the tutorial's flow, it plays this crucial role to make sure the PutDatabaseRecord processor applies the SQL from the bin logs in the proper order. If you are running NiFi in a cluster, processors prior to EnforceOrder could be multi-threaded and/or run on multiple nodes to optimize performance. However, the EnforceOrder processor must be running on the primary node and single-threaded:
PutDatabaseRecord Database Connection Pooling Service
The PutDatabaseRecord processor requires a DBCPConnectionPool controller service. This controller service has a property called "Validation query" which is used to verify connections before returning them:
Note: Using query validation may have a performance penalty.
PutDatabaseRecord "Rollback On Failure" Property
The PutDatabaseRecord processor has a "Rollback on Failure" property:
The default value is "false", but setting to "true" will stop the processing of incoming flowfiles if one has failed. Effectively, this stops the processor even if it is running and doesn't allow further processing until the failed flowfile is corrected.
Note: The "Rollback on Failure" property exists in the PutHiveQL, PutSQL, and PutHiveStreaming processors as well.