Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Count the number of FlowFiles that have the same filename attribute

Solved Go to solution
Highlighted

Count the number of FlowFiles that have the same filename attribute

New Contributor

Hi,

I have a number of flow files coming to the MergeContent processor to merge them into a zip file. However, some of the flow files have the same name which causes duplicate entry error. I want to add a counter value to their filename attribute in order to merge them (ex. A.txt (1), A.txt (2), etc.).

I used DetectDuplicate processor to separate the flow files with duplicate filename, not sure how to add a counter variable to their filename attribute. Can anyone give me an idea how to solve this?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Count the number of FlowFiles that have the same filename attribute

New Contributor

I just figured out the solution by using wait/notify processors pair. Each Notify processor will allow only one flowfile with duplicated filename. The UpdateAttribute will update a count variable in order for the Notify processor to send back to Wait processor.

80489-untitled.png

3 REPLIES 3

Re: Count the number of FlowFiles that have the same filename attribute

Super Guru

@Hoa Vuong

Feed the duplicate relation from DetectDuplicate processor to Update attribute processor with nextInt subject less function

Add new property as

filename

${filename}(${nextInt()})

By using above expression will add nextint to the filename

For more reference look into this link regarding nextInt() function usage.

(Or)

By storing state in UpdateAttribute processor

add new property as

theCount

${getStateValue("theCount"):plus(1)}

Use another update attribute processor to add theCount attribute to filename.

refer to this regarding getStateValue funtion usage.

79458-state.png

add new property as

filename

${filename}(${theCount})

By using this approach you can reset your state value to 0 once it reaches to your threshhold value(like if value is 100 then set to 0 again) and refer to this link regarding reset the value.

Re: Count the number of FlowFiles that have the same filename attribute

New Contributor

Hi @Shu,

Thanks for your suggest solution, but it doesn't work in my set up. I might have 100 flow files coming out of duplicate relationship of DetectDuplicate processor. 50 of them will have A.txt filename while the rest will be B.txt. The expected output would be A (1).txt, ..., A (50).txt and B (1).txt, .., B(50).txt. Since the number of flow files is not a fixed number, I can't really reset the state value. They all have the same ${segment.original.filename} value by the way. If there are another 10 flow files with A.txt coming out of DetectDuplicate processor with a different ${segment.original.filename} value, then these flow file should be named from 1 to 20.

Re: Count the number of FlowFiles that have the same filename attribute

New Contributor

I just figured out the solution by using wait/notify processors pair. Each Notify processor will allow only one flowfile with duplicated filename. The UpdateAttribute will update a count variable in order for the Notify processor to send back to Wait processor.

80489-untitled.png

Don't have an account?
Coming from Hortonworks? Activate your account here