Created 03-13-2025 02:30 AM
Hello Experts,
I have a flow where I use 'SplitRecord' processors to split json records and then in downstream I am merging them back to single file using defragment strategy.
Flow is dividing into 2 route downstream so I use merge in 2 places, where merge works fine in one place and does not work in another place.
Flow diagram how it is setup -
So on the left side MergeContent its merging fine with defragment strategy. But on the right side (red marked) its giving below error (since another api call is in between the flow file arrival to merge record processor might be not in order.)
(Example Error : "Cannot defragment flow files with fragment Id XXXXXbecause the expected number of fragment is 5 but found only 3.")
I am sending 20 request to end point (HandleHttpRequest) and each request has 5 json records so each request gets split into 5 FF. so it will be come total 100 flow files.
(I am sending all 20 request one after other or parallely then this is the issue)
Note : If I send only one request (which gets split into 5) then there is no error at all it works fine.
Referred below post answer from @MattWho and tried same settings (Prioritizers is set and max bins increased to 50) -
https://community.cloudera.com/t5/Support-Questions/MergeContent-defrag-errors-when-handling-multipl...
Now the behaviour is it works some time and errors out some time.
And thing to note is , if I just stop the merge content processor and keep it stopped till all messages arrives and the start, then everything works fine.
So I tried setting "Run schedule " to 60 seconds from 0 seconds and concurrency to 5 from 1, then it looked like worked for all the time.
But my cases is kind of dynamic so 60 seconds Run schedule may not be meaningful.
Is there any things I am missing? your suggestions would be much appreciated
Thanks in advance,
Mahendra
Created 03-17-2025 11:20 AM
@hegdemahendra
When using the "Defragment" merge strategy, the order of the FlowFiles will not have any affect. Defragment is dependent on the following FlowFile attributes being set on the FlowFiles:
The MergeContent will only merge a defragmented FlowFile if all fragments are allocated to a bin before the max bin age is reached. The bin age starts the moment the first fragment is allocated to a bin.
Lets look at the two scenarios that would result in what you are seeing:
I verified both these scenarios on my Apache NiFi 1.26 based NiFi cluster setup.
Another thing to consider... flow design.
Things you can check and try:
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-17-2025 11:20 AM
@hegdemahendra
When using the "Defragment" merge strategy, the order of the FlowFiles will not have any affect. Defragment is dependent on the following FlowFile attributes being set on the FlowFiles:
The MergeContent will only merge a defragmented FlowFile if all fragments are allocated to a bin before the max bin age is reached. The bin age starts the moment the first fragment is allocated to a bin.
Lets look at the two scenarios that would result in what you are seeing:
I verified both these scenarios on my Apache NiFi 1.26 based NiFi cluster setup.
Another thing to consider... flow design.
Things you can check and try:
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-27-2025 11:03 PM
Thank you so much @MattWho for the detailed answer.
The retry logic helped a lot, I have added 'RetryFlowFile' processors in between to avoid infinite loop of retry.