Created on 12-12-2019 09:27 PM - edited 12-13-2019 12:17 AM
Hi,
We are currently using merge content processor to merge kafka messages with "minimum number of entries" = 500. when 500 messages reached it will merge as single file. That use case works fine. whenever end of the day at 11:59, is there any queue pending in merge content processor of kafka messages needs to dynamically pushed before start to the new date. Kindly help me out from this kind of use case.
Workflow :
kafka_consumer -> merge_content_prcessor -> puthdfs
Created 12-13-2019 06:46 AM
The only method offered by the MergeContent Processor to force a bin to merge when it has not reached the configured minimum set values is the "Max Bin Age" property.
How long does it take to accumulate 500 FlowFiles in a bin for merge?
Set your Max Bin age to a value higher than that time duration.
Now to avoid merging FlowFile from day 1 and day 2 (scenario where day 2 files start showing up at your MergeContent processor before end of day 1 max bin age time has been reached), I suggest using the "Correlation Attribute Name" to make sure only FlowFiles from the same day are placed in the same bin. This would require you to extract the date/day from your FlowFiles somehow (maybe by HDFS dir path, or filename, or some file metadata if these exist).
Hope this helps,
Matt
Created 12-13-2019 06:46 AM
The only method offered by the MergeContent Processor to force a bin to merge when it has not reached the configured minimum set values is the "Max Bin Age" property.
How long does it take to accumulate 500 FlowFiles in a bin for merge?
Set your Max Bin age to a value higher than that time duration.
Now to avoid merging FlowFile from day 1 and day 2 (scenario where day 2 files start showing up at your MergeContent processor before end of day 1 max bin age time has been reached), I suggest using the "Correlation Attribute Name" to make sure only FlowFiles from the same day are placed in the same bin. This would require you to extract the date/day from your FlowFiles somehow (maybe by HDFS dir path, or filename, or some file metadata if these exist).
Hope this helps,
Matt
Created 12-15-2019 09:26 PM
Sure thanks. @MattWho. it works!