Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MergeRecord Defragment confusing record count and fragment.count

Highlighted

MergeRecord Defragment confusing record count and fragment.count

New Contributor

Hello, I have a pagination scenario, where an API returns 50 records at a time. Total count is 368, so multiple pages are involved. My goal is to combine all into a single FlowFile. NiFi version is 1.8.0.

I have set up a Notify/Wait scenario to wait until all 8 pages are gathered. Then I send all of them to a MergeRecord processor with a Defragment merge strategy. Before the MergeRecord processor, the defragment attributes of fragment.identifier, fragment.count, and fragment.index are all properly set.

I expect the MergeRecord process to join the 8 flowfiles (368 total records) into a single flowfile (with 368 total records). However, I receive an error "fragment.count had a value of 8 but only 1 of 8 FlowFiles were encountered before this bin was evicted."

97395-2018-12-17-14-00-43-top-level-flow.png

As part of troubleshooting, I set the fragment.count to 368 for all the flowfiles. Similar error message, but interestingly it found all 8 of the flowfiles this time.

97394-2018-12-17-13-56-29-top-level-flow.png

I ended up debugging from source to understand what was going on. The code compares the record count from the first file (50) to the maxRecords (8), which is populated from the fragment.identifier attribute. It's not apples to apples, one value is the number of records in an individual flowfile, the other is the number of expected flow files.

What am I missing here?

97396-2018-12-17-14-06-18-nifi-c-dev-nifi-rel-nifi-180-n.png

3 REPLIES 3

Re: MergeRecord Defragment confusing record count and fragment.count

New Contributor

I can get it to work if I use a SplitRecord to split all the paged files into flowfiles with a single record each. In this case, fragment.count is equal to the count of flow files, which are both equal to the total number of records.

I do not believe this is the intended design of the MergeRecord processor. Has anyone been successful configuring the MergeRecord processor in Defragment mode?

97397-2018-12-17-16-35-00-top-level-flow.png

Re: MergeRecord Defragment confusing record count and fragment.count

New Contributor

Hi, I have the exact same issue. was this issue resolved ? @Adam Roderick, were you able to use the defrag mergerecord process?

Re: MergeRecord Defragment confusing record count and fragment.count

New Contributor

No, @Rohit K, unfortunately I the issue persists.

Don't have an account?
Coming from Hortonworks? Activate your account here