Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi MergeContent with Defrag

avatar
Contributor

Hello, I am using Nifi 1.0.0 and am trying to merge records from an ExecuteSql processor using MergeContent.

I wanted to try Defrag merge strategy and have the following setup in an upstream UpdateAttribute processor for each flow file:

1. fragment.identifier - mmddyy of the flow file

2. fragment.index - nextInt()

3. fragment.count - executesql.row.count

4. segment.original.filename - filename

When i run the workflow - i get this error :

Cannot Defragment FlowFiles with Fragment Identifier because the expected number of fragments is <sql record count> but found only 1 fragments. It seems like MergeContent is trying to merge too soon - appreciate any advice.

My workflow is

ExecuteSql -> SplitAvro -> UpdateAttribute (adds fragment.* attribute - could not see these on SplitAvro even though doc indicates it should be present) -> ConvertAvroToJson -> EvaluateJsonPath (to extract only some sql columns) -> ReplaceText(for conversion to comma delimited) -> MergeContent -> PutFile

NOTE: I got inconsistent file lengths when trying out various MergeContent Bin-packing configurations so turned to Defrag.

thanks

1 ACCEPTED SOLUTION

avatar
Master Guru

The defragment mode of MergeContent is meant to work with upstream processors that have "fragmented" a flow file and produce the standard fragment attributes (fragment.identifier, fragment.index, fragment.count). In your example, SplitAvro is one of those processors that takes a flow file and fragments its content, but it didn't originally produce the fragment attributes . It was updated in Apache NiFi 1.1.0 (https://issues.apache.org/jira/browse/NIFI-2805) to add the fragment attributes, so if you upgrade then you should see them.

View solution in original post

17 REPLIES 17

avatar
Master Guru

I think you need to remove the value for "Maximum Number of Entries", in your screenshot is set to 1000 which means it would attempt to merge at 1000 before seeing all 325070 fragments. Just leave it blank.

avatar
Contributor

I am unable to set blank for max since MergeContent complains it is not a valid integer. Also set min and max to the same value 325070 as well as 1/325070 but still get the fragment error.

avatar
Master Guru

Can you post a template of your flow (the XML file from exporting a template)?

I don't think I could help much more without seeing your exact flow.

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates

avatar
Contributor

GM - The template is attached. Appreciate your help!hcc-mergecontent-issue-support.xml

avatar
Master Guru

I'm wondering if some records are not making through the processors between SplitAvro and MergeContent, most likely it would be at EvaluateJsonPath. Can you try running this updated template and see if any flow files go to any of the LogAttribute processors? hcc-mergecontent-issue-support-updated.xml

avatar
Contributor

I don't see anything flowing into LogAttribute - snapshot of flow attached.screen-shot-2017-01-26-at-84457-pm.png

avatar
Contributor

I reran with the LogAttribute processor, but did not see any flow files going into it. screen-shot-2017-01-26-at-84457-pm.png

avatar
Contributor

Hi @ss883r 

 

Did you find a solution for this issue?