- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
MergeContent processor is erroring with Defragment strategy
- Labels:
-
Apache NiFi
Created 03-13-2025 02:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Experts,
I have a flow where I use 'SplitRecord' processors to split json records and then in downstream I am merging them back to single file using defragment strategy.
Flow is dividing into 2 route downstream so I use merge in 2 places, where merge works fine in one place and does not work in another place.
Flow diagram how it is setup -
So on the left side MergeContent its merging fine with defragment strategy. But on the right side (red marked) its giving below error (since another api call is in between the flow file arrival to merge record processor might be not in order.)
(Example Error : "Cannot defragment flow files with fragment Id XXXXXbecause the expected number of fragment is 5 but found only 3.")
I am sending 20 request to end point (HandleHttpRequest) and each request has 5 json records so each request gets split into 5 FF. so it will be come total 100 flow files.
(I am sending all 20 request one after other or parallely then this is the issue)
Note : If I send only one request (which gets split into 5) then there is no error at all it works fine.
Referred below post answer from @MattWho and tried same settings (Prioritizers is set and max bins increased to 50) -
https://community.cloudera.com/t5/Support-Questions/MergeContent-defrag-errors-when-handling-multipl...
Now the behaviour is it works some time and errors out some time.
And thing to note is , if I just stop the merge content processor and keep it stopped till all messages arrives and the start, then everything works fine.
So I tried setting "Run schedule " to 60 seconds from 0 seconds and concurrency to 5 from 1, then it looked like worked for all the time.
But my cases is kind of dynamic so 60 seconds Run schedule may not be meaningful.
Is there any things I am missing? your suggestions would be much appreciated
Thanks in advance,
Mahendra
