It's hard to tell from your flow if you have the 4 flow files you want to merge with their "fragment.*" attributes set correctly. If you use Defragment as a Merge Strategy, then the flow files must share the same value for fragment.count and fragment.id attributes. If those are not set and you just want to take the first 4 you get, set Merge Strategy to Bin-Packing Algorithm.
If cluster, are the FlowFiles being produced by each of your SelectHiveQL processors being produced on the same node? The MergeContent processor will not merge FlowFiles from different cluster nodes.
Assuming that all FlowFiles are on same NiFi instance, the only way I could reproduce your scenario was:
Each FlowFile had a different value assigned to the "table_name" FlowFile Attribute and Merge Strategy was set to "Bin-Packing Algorithm". This caused each FlowFile to be placed in its own bin. At the end of 5 minutes max bin age, each bin of 1 was merged. If the intent is always to merge one FlowFile from each incoming connection, what is the purpose of setting a "Correlation Attribute Name"
Setting Maximum number of bins to 1 and having 4 source FlowFiles become queued at different times.
The "Defragment" Merge Strategy will bin FlowFiles based on FlowFiles with matching values in the "fragment.identifier" FlowFile Attribute. It will then merge the flowFiles using the "fragment.index" and "fragment.count" attributes. Since you have also specified a correlation attribute, the MergeContent processor will instead use the value associated to that attribute instead of "fragment.identifier" to bin your files. If I have unique values on each FlowFile for "table_name", then each FlowFile ends up in a different bin and are routed to failure right away (if bins set to 1) or after 5 minutes max bin age since not all fragments where present.
The other possibility is that "fragment.count" and "fragment.index" is set to 1 on every FlowFile.
I would stop your MergeContent processor and allow 1 FlowFile to queue in each connection feeding it. Then use the "list queue" capability to inspect the attributes on each queued FlowFile.
What values are associated to each FlowFile for the following attributes:
This was standalone and all had table_name set same. I was able to resolve it by adding updateattr processors after all hive processors and manually setting the fragment.index to 0-3 and fragment.count to 4. In this way it knew to combine all the fragments for 1 output flowfile. I wish the attribute could be "updated" within the hive processor to avoid yet more processors just to reset attr.
I setup a similar dataflow that is working as expected. The only difference is you made your fragment.index values 0-3 and I made mine 1-4. Is the FlowFile Attribute "table_name" set on all four FlowFiles? Is the value associated to the FlowFile Attribute "table_name" on all 4 FlowFiles exactly the same?
Below is my test flow that worked:
As you can see one 4 FlowFile merge was successful and a second is waiting for that 4th file before being merged.
@Matt Clarke For some reason, I was unable to reply directly to your comment above... Yes table_name was the same across all inputs. Not sure, why it wasn't working but moved to a different approach to resolve. I basically unioned all the independent hive queries to make it one input.