1. Kafka queue that outputs inventory records (Eg. columns= storeID, shoeID, sale_date, price)
2. putElasticSearchHttpRecord to insert all the inventory records into a detailed index
3. lookupService to check if each inventory record exist in the summary index
a. if does not exist, putElasticSearchHttpRecord to insert record with additional column: frequency=1 into summary index
b. if exist, increment the frequency=X+1 and putElasticSearchHttpRecord to update record into summary index
My composite key is storeID and shoeID.
The problem with streaming data is during step 3. lookupservice, when 2 identical composite key flowfiles come side by side, if record exist in summary index, the lookupservice would return the same frequency count to both flowfiles and the final updating of the frequency count would be missing one count.
My questions are
1. whether nifi have any existing processors that can avoid / minimize the impact of this issue? If yes, are there any examples? I have played with wait and notify but not sure how I can use them to check whether composite key exist in the distributedCacheServer, put the identical composite key record on wait and notify after processing is done
2. If the above has a solution, what ways can I still improve the performance using some way of distribution load such that in a nifi cluster environment, different composite keys can be set to process together without having identical composite keys waiting and becoming a heavy bottleneck?