Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Merge Content after route on attribute

Highlighted

Merge Content after route on attribute

New Contributor

Hi,

This is my flow

Split Text -> Route on Attribute -> Merge Content (Value 0)

-> Merge Content (Value 1)

Split Text can generate any number of files.

Route on Attribute can only have two different attribute values to split (0 or 1). I only want Merge content to wait only for the flow files generated in it is route. For example if there are 5 flow files generated for Merge Content (Value 0), the merge content should wait for all the 5 files and merge it, similarly if there are 4 flow files generated for Merge Content (Value 1), the merge content should wait for all the 4 files and merge it.

Can you please suggest how I can implement this, findings so far,

1. Using Defragement: Will not work since both merge contents are waiting for all the flow files (fragment.identifier).

2. Using Bin-Packing: Kind of there on this solution, by adding correlation identifier (my attribute value which has 0 or 1). However it is not waiting for all the flow files that is going in that route.

Kindly let me know if you need additional details.

Thank you!

Yuva

1 REPLY 1

Re: Merge Content after route on attribute

Super Guru
@Yuvapraveen Mathivannan

As i don't know the complete picture of your use case here are some thoughts regards to this issue as follows.

1.If you are thinking to use MergeContent processor we need to have minimum number of flowfiles then processor will wait for until those many flowfile until it merges as them into one but in your case we don't know how many flowfiles does the processor needs to wait before merging.

- We can use Max bin age property(like 1 min,5 mins, 1 hr) to force merge the bin and merge all the flowfiles that are waiting on the processor.

(or)

2.If you are doing some filtering on the records to exclude them from the flowfile then use QueryRecord processor.

-Configure/enable the Record reader/writer controller services and add the sql query property as

select * from FLOWFILE where id <> 10

now processor runs the above sql query and send the results as output flowfile.

By following this way we don't need to use merge content (or) split text processors.

Don't have an account?
Coming from Hortonworks? Activate your account here