Created on 08-15-2017 05:34 PM - edited 08-17-2019 07:16 PM
I have 4 Hive queries returning 4 separate flowfiles going in to Merge. I'd like the 4 files to be merged into one, but everything I've tried is not working.
-all input queues to merge are "Back Pressure Object Threshold = 1"
-require all 4 flowfiles before continuing to merge
4 files go in and 4 files come out?
Created 08-15-2017 05:49 PM
It's hard to tell from your flow if you have the 4 flow files you want to merge with their "fragment.*" attributes set correctly. If you use Defragment as a Merge Strategy, then the flow files must share the same value for fragment.count and fragment.id attributes. If those are not set and you just want to take the first 4 you get, set Merge Strategy to Bin-Packing Algorithm.
Created 08-15-2017 05:57 PM
Created 08-15-2017 06:01 PM
The issue you are most likely running in to is caused by only having 1 bin.
https://issues.apache.org/jira/browse/NIFI-4299
Change number of bins to at least 2 and see if the resolves your issue.
Thanks,
Matt
Created 08-15-2017 08:03 PM
Created 08-16-2017 01:46 PM
Is this a NiFi standalone or a NiFi cluster?
If cluster, are the FlowFiles being produced by each of your SelectHiveQL processors being produced on the same node? The MergeContent processor will not merge FlowFiles from different cluster nodes.
Assuming that all FlowFiles are on same NiFi instance, the only way I could reproduce your scenario was:
I would stop your MergeContent processor and allow 1 FlowFile to queue in each connection feeding it. Then use the "list queue" capability to inspect the attributes on each queued FlowFile.
What values are associated to each FlowFile for the following attributes:
Thank you,
Matt
Created on 08-16-2017 06:17 PM - edited 08-17-2019 07:16 PM
Thanks Matt and Matt!
Created 08-16-2017 10:14 PM
Ugh... I was incorrect. It does NOT wait for all 4 when all processors are running... Back to unresolved. I must've had the merge turned off on the previous run, then turned it on.
Created on 08-29-2017 05:32 PM - edited 08-17-2019 07:16 PM
I setup a similar dataflow that is working as expected. The only difference is you made your fragment.index values 0-3 and I made mine 1-4. Is the FlowFile Attribute "table_name" set on all four FlowFiles? Is the value associated to the FlowFile Attribute "table_name" on all 4 FlowFiles exactly the same?
Below is my test flow that worked:
As you can see one 4 FlowFile merge was successful and a second is waiting for that 4th file before being merged.
Thanks,
Matt
Created 08-29-2017 05:45 PM
@Matt Clarke For some reason, I was unable to reply directly to your comment above... Yes table_name was the same across all inputs. Not sure, why it wasn't working but moved to a different approach to resolve. I basically unioned all the independent hive queries to make it one input.