My Flow is something like this
Flow file( In Json format)--> Split Json--> Merge Content(Bin Age-10 min) --> Update attribute--> PutHDFS.
Since i have 2 node cluster, my output of Merge content is always 2 files. Can i merge into one file?
The reason Why i want to do it is , The output of PUT HDFS is connected to --> Replace Text( that replaces flow file with hive query) --> PutHHiveQL. My issue is the output of merge content comes out with 1 min apart. so, I have 2 same queries running at the same time generating duplicate data, If I expire one flow file, the query doesn't gather information from the second flow file.If i cannot merge into one file, Is there a possibility to add **wait** until both the flow files reaches the queue for replace text and one query generated to run puthiveql?
Your help is highly appreciated.. Thanks in advance
As you are using SplitJson processor and splitjson processor adds fragment.index,fragment.identifier,fragment.count attributes to each flow file.
By making use of those attribute we can merge all the splitted flowfiles into one flow file by using merge content processor.
To achieve this case merge content processor configs:
Defragment //processor will wait until all the blowflies are merged into single file again.
By using this strategy you are not going to get 2 flow files from merge content processor at all.
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.