Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi Queue needs to push end of the day

avatar
Expert Contributor

Hi,

 

We are currently using merge content processor to merge kafka messages with "minimum number of entries" = 500. when 500 messages reached it will merge as single file. That use case works fine. whenever end of the day at 11:59, is there any queue pending in merge content processor of kafka messages needs to dynamically pushed before start to the new date. Kindly help me out from this kind of use case. 

Workflow :

kafka_consumer -> merge_content_prcessor -> puthdfs

@mburgess 

1 ACCEPTED SOLUTION

avatar
Super Mentor

@varun_rathinam 

 

The only method offered by the MergeContent Processor to force a bin to merge when it has not reached the configured minimum set values is the "Max Bin Age" property.

 

How long does it take to accumulate 500 FlowFiles in a bin for merge?
Set your Max Bin age to a value higher than that time duration.

Now to avoid merging FlowFile from day 1 and day 2 (scenario where day 2 files start showing up at your MergeContent processor before end of day 1 max bin age time has been reached), I suggest using the "Correlation Attribute Name" to make sure only FlowFiles from the same day are placed in the same bin. This would require you to extract the date/day from your FlowFiles somehow (maybe by HDFS dir path, or filename, or some file metadata if these exist). 

Hope this helps,

Matt

View solution in original post

2 REPLIES 2

avatar
Super Mentor

@varun_rathinam 

 

The only method offered by the MergeContent Processor to force a bin to merge when it has not reached the configured minimum set values is the "Max Bin Age" property.

 

How long does it take to accumulate 500 FlowFiles in a bin for merge?
Set your Max Bin age to a value higher than that time duration.

Now to avoid merging FlowFile from day 1 and day 2 (scenario where day 2 files start showing up at your MergeContent processor before end of day 1 max bin age time has been reached), I suggest using the "Correlation Attribute Name" to make sure only FlowFiles from the same day are placed in the same bin. This would require you to extract the date/day from your FlowFiles somehow (maybe by HDFS dir path, or filename, or some file metadata if these exist). 

Hope this helps,

Matt

avatar
Expert Contributor

Sure thanks. @MattWho. it works!