Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

The output of merge content generate 2 files due to a 2 node cluster,How can I merge into one huge Json file in the end using NIFI

Highlighted

The output of merge content generate 2 files due to a 2 node cluster,How can I merge into one huge Json file in the end using NIFI

Explorer

My Flow is something like this

Flow file( In Json format)--> Split Json--> Merge Content(Bin Age-10 min) --> Update attribute--> PutHDFS.

Since i have 2 node cluster, my output of Merge content is always 2 files. Can i merge into one file?

The reason Why i want to do it is , The output of PUT HDFS is connected to --> Replace Text( that replaces flow file with hive query) --> PutHHiveQL. My issue is the output of merge content comes out with 1 min apart. so, I have 2 same queries running at the same time generating duplicate data, If I expire one flow file, the query doesn't gather information from the second flow file.If i cannot merge into one file, Is there a possibility to add **wait** until both the flow files reaches the queue for replace text and one query generated to run puthiveql?

Your help is highly appreciated.. Thanks in advance

Sudheer

4 REPLIES 4
Highlighted

Re: The output of merge content generate 2 files due to a 2 node cluster,How can I merge into one huge Json file in the end using NIFI

Super Guru

@Sudheer K

As you are using SplitJson processor and splitjson processor adds fragment.index,fragment.identifier,fragment.count attributes to each flow file.

By making use of those attribute we can merge all the splitted flowfiles into one flow file by using merge content processor.

To achieve this case merge content processor configs:

Use

MergeStrategy as

Defragment //processor will wait until all the blowflies are merged into single file again.

By using this strategy you are not going to get 2 flow files from merge content processor at all.

Refer to this and this links for more reference regards to merge content defragment strategy.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Highlighted

Re: The output of merge content generate 2 files due to a 2 node cluster,How can I merge into one huge Json file in the end using NIFI

Explorer

@Shu

Thanks for the information. I tried the above and it doesn't work. It still generates 2 files. Screenshot below

capture.jpg

Your help is appreciated

Highlighted

Re: The output of merge content generate 2 files due to a 2 node cluster,How can I merge into one huge Json file in the end using NIFI

Super Guru

@Sudheer K

Not sure why you are having Max Bin Age as 2 mins, Could you try once taking off the value of the from MaxBinAge once?.

If still doesn't work please Attach your flow.xml if possible.

Highlighted

Re: The output of merge content generate 2 files due to a 2 node cluster,How can I merge into one huge Json file in the end using NIFI

Explorer

@shu

Thanks for the information. I tried the above and it doesn't work. It still generates 2 files. Screenshot below

capture.jpg

Your help is appreciated

Don't have an account?
Coming from Hortonworks? Activate your account here