Support Questions

Find answers, ask questions, and share your expertise

mergecontent processor in nifi

avatar

Hi,

I have a flow wherein there is expected to have 136 flowfiles and after receiving the same,its to be notified via putemail.

The flow will be putfile->mergecontent->putemail.My doubt is how we can exactly ensure136 flowfiles .

Is the property max group size dependent on the size of flowfiles,as the size of these files varies day by day.So the only option is to set max bin age?

@Shu can you please help?

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Gillu Varghese

-

I am assuming you mean GetFile instead of PutFile?
-

How is your GetFile scheduled to run?
Is it always going to get exactly 136 FlowFiles? As far as scheduling ingest, are all 136 files expected to arrive at same time each day? You can schedule the GetFile using a Cron that runs daily.

-

As far as MergeContent configuration goes, you would configure both the minimum and maximum number if Entries to 136. Each entry is a FlowFile. I would also set your Max Bin age to some value that allows sufficient time from ingest of FlowFiles 1 through 136. The Max bin age serves as your exit strategy should not all 136 FLowFiles get ingested in the desired max bin age time frame. The merged flowfile produced will have a new FlowFile attributes generated on it named "merge.count" that reports the number of FlowFiles in the bundle.

After your mergeContent processor I would add a RouteOnAttribute processor that is configured to verify the the value assigned to the "merge.content" FlowFile attribute is actually the expected 136 (for example '${merge.count:equals('136')}' ). If it is not then the bundle can be routed down a different path for error handling.

-

93739-screen-shot-2018-11-27-at-20606-pm.png

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

View solution in original post

4 REPLIES 4

avatar

@Shucan you please help

avatar
Master Mentor
@Gillu Varghese

-

I am assuming you mean GetFile instead of PutFile?
-

How is your GetFile scheduled to run?
Is it always going to get exactly 136 FlowFiles? As far as scheduling ingest, are all 136 files expected to arrive at same time each day? You can schedule the GetFile using a Cron that runs daily.

-

As far as MergeContent configuration goes, you would configure both the minimum and maximum number if Entries to 136. Each entry is a FlowFile. I would also set your Max Bin age to some value that allows sufficient time from ingest of FlowFiles 1 through 136. The Max bin age serves as your exit strategy should not all 136 FLowFiles get ingested in the desired max bin age time frame. The merged flowfile produced will have a new FlowFile attributes generated on it named "merge.count" that reports the number of FlowFiles in the bundle.

After your mergeContent processor I would add a RouteOnAttribute processor that is configured to verify the the value assigned to the "merge.content" FlowFile attribute is actually the expected 136 (for example '${merge.count:equals('136')}' ). If it is not then the bundle can be routed down a different path for error handling.

-

93739-screen-shot-2018-11-27-at-20606-pm.png

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

avatar

@Matt ClarkeHi Matt..i tried the above configuration but the 136 files are not getting merged in one go.Its getting merged in number of 20 and 15.I had set the bin age to 5mins.I wanted a single notification after merging the 136 files.Can you please help

avatar
Master Mentor

@Gillu Varghese

A few questions:
1. Are you sure all 136 files are reaching the MergeContent processor's inbound connection within 5 minutes? The bin age starts when very first FlowFile is added to a bin. At 5 minutes from that point the bin will be merged even if not all 136 have arrived.

2. Is your NiFi a cluster or standalone instance of NiFi? If cluster, are all 136 FlowFiles on same NiFi node? Each node in a cluster can only merge FlowFiles residing on same Node. There is a new load balanced connection feature in NiFi 1.8 that can help here if this is the case. https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster

-

Try setting your max bin age to a much higher value and see what results you see.

-

Thank you,

Matt