We have a list of zip file, and we want to unzip them and filter based on filename. Then we want to concatenate content of each zip ile in file text for each original zip.
for example : fileA.zip, fileB.zip.
- fileA : fileA_1.html fileA_2.html fileA_tofilter.html etc ..
- fileB : fileB_1.html fileB_2.html etc ..
As output we want : two zip fileA.zip => a concatenantion of fileA_1.html fileA_2.html, same thing for B
But our problem is, the mergeContent processor is too fast and concatenate each time the queue has some flowfiles. We are trying to wait until all files to be filtered. We want to update the fragment.count attribute, after filtering but failed on it.
In your merge content processor Keep the
Merge Strategy as
Defragement //The 'Defragment' algorithm combines fragments that are associated by attributes back into a single cohesive FlowFile.
As you are using unpackContent processor and this processor adds fragment.index attribute and when we are merging based on Defragement strategy then merge content processor will merge all the fragements associated with the parent flowfile.
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Other way is to Use Correlation Attribute Name property in Merge Content processor and keep your source filename attribute in this property then you can use Max bin age as a wildcard to make the bin to eligible to be merged after some time.
First of all, a great thanks for your help.
Second of all, the defragment strategy generate a infinit processus because of our filterfing step above.
Our hypothesis is : after this filtering, the defragment step wait for the total numbrer of files and did not take into account the decrease due to this filter. When we do some chacks on attributes coming from filter, the fragment.count value seems to be still equal to the total files count.
Thank you again.