We consume from a kafka topic two types of messages. In a happy path the first one is published and inserted to the DB, the second one is published and the nifi flow would update the record status as processed. What we we are seeing is a race condition between the two flows resulting in 2 records in our DB. I have put in both putSql processors a merge sql statement. The first one is: search for it, if not found, insert, if found do nothing. The second one is: search for it, if found, update, if not found insert. we are using the same lookup values for both statements. Looking at the timestamps in the DB they are around 200 ms apart from one another. If i put the processor to primary node only it cannot keep up with the volume. Originally i had just an Update statement, but we saw records were not getting updated. so we thought the update was not finding the record because it was being written to the DB at the same time so we decided to make it an upsert (merge) instead and now we see duplicates. The messages coming in from kafka are published about 200ms apart from each other.
attached is a picture with an outline of what i am talking about.