I have 4 Hive queries returning 4 separate flowfiles going in to Merge. I'd like the 4 files to be merged into one, but everything I've tried is not working.
-all input queues to merge are "Back Pressure Object Threshold = 1"
-require all 4 flowfiles before continuing to merge
4 files go in and 4 files come out?
It's hard to tell from your flow if you have the 4 flow files you want to merge with their "fragment.*" attributes set correctly. If you use Defragment as a Merge Strategy, then the flow files must share the same value for fragment.count and fragment.id attributes. If those are not set and you just want to take the first 4 you get, set Merge Strategy to Bin-Packing Algorithm.
Is this a NiFi standalone or a NiFi cluster?
If cluster, are the FlowFiles being produced by each of your SelectHiveQL processors being produced on the same node? The MergeContent processor will not merge FlowFiles from different cluster nodes.
Assuming that all FlowFiles are on same NiFi instance, the only way I could reproduce your scenario was:
I would stop your MergeContent processor and allow 1 FlowFile to queue in each connection feeding it. Then use the "list queue" capability to inspect the attributes on each queued FlowFile.
What values are associated to each FlowFile for the following attributes:
Thanks Matt and Matt!
Ugh... I was incorrect. It does NOT wait for all 4 when all processors are running... Back to unresolved. I must've had the merge turned off on the previous run, then turned it on.
I setup a similar dataflow that is working as expected. The only difference is you made your fragment.index values 0-3 and I made mine 1-4. Is the FlowFile Attribute "table_name" set on all four FlowFiles? Is the value associated to the FlowFile Attribute "table_name" on all 4 FlowFiles exactly the same?
Below is my test flow that worked:
As you can see one 4 FlowFile merge was successful and a second is waiting for that 4th file before being merged.
@Matt Clarke For some reason, I was unable to reply directly to your comment above... Yes table_name was the same across all inputs. Not sure, why it wasn't working but moved to a different approach to resolve. I basically unioned all the independent hive queries to make it one input.