Support Questions

Find answers, ask questions, and share your expertise

How to count the number of occurrences of a word (similar to word count) and do an action on it

avatar
Expert Contributor

Hi All,

I have an use case where I want to find number of occurrences of the word and want to perform an action on it.

example:

1. I have multiple flow files coming in

2. I want to extract a word (say, user_name) using extracttext processor

3. count the word

4. if user_name_count =10

5. do replacetext 10 as 1

6. putemail to user_name that user_name count is 10.

Can you please let me know which processors can be helpful for the usecase.

Suggestions are appreciated.

1 ACCEPTED SOLUTION

avatar
Master Guru
@Mark

I tried your case By using UpdateAttribute's Store the state feature.
flow:

91491-flow.png

1.Two GenerateFlowfiles //to get 
2 flowfiles2.SplitText //split the flowfile into 1 line
3.ExtractText //extract the first value of the from the content
4.RouteOnAttribute //check the extracted value from the flowfile attribute
5.UpdateAttribute //add one to the seq attribute and reset the seq attribute value when it reaches to 10(advance d usage of update attribute processor)
6.RouteOnAttribute //check seq attribute value and send to putemail if seq = 10
7.PutEmail //send mail

I have attached flow template, reuse it and change as per your requirements.

222030-support-update-reset.xml

View solution in original post

6 REPLIES 6

avatar
Master Guru
@Mark

We need more details to provide correct solution for this case
1.Could you please provide some sample data for this case?

2.Do you want to count user_name in particular flowfile i.e if flowfile content having 10 times user_name then sent out email?

(or)
Count 10 flowfiles that have user_name and send out mail once the count reaches out 10?

3.Do you know the schema for the flowfile?

avatar
Expert Contributor

@Shu

1. Sample data:

Every value is present in attributes(i.e. every flowfile is parsed and the value in the flowfile is assigned to attributes)

There are multiple flow files with the same value (user_name)in attributes.

ex:

flowfile1 attributes:
user_name: mark, file_in: 2018-09-18 15:00:00, file_out: 2018-09-18 15:01:00 user_name: michelle, file_in: 2018-09-18 15:00:02, file_out: 2018-09-18 15:01:01 user_name: mark, file_in: 2018-09-18 15:00:05, file_out: 2018-09-18 15:01:01 flowfile2 attributes: user_name: mark, file_in: 2018-09-18 15:01:00, file_out: 2018-09-18 15:01:10 user_name: stella, file_in: 2018-09-18 15:01:12, file_out: 2018-09-18 15:01:21

2. I want to count all the flowfiles that have user_name (in the above example count of mark is 3 in both the flowfiles)

3. Schema of the flow file is just as above 3 fields, which are assigned to attributes.

Thank you

avatar

While I usually recommend using the existing processors to perform individual tasks and chain them together to achieve your overall goal, I think this is a case where an ExecuteScript processor with a custom script could be best. As long as the input is not on the order of 10 MB+ per flowfile, you should be able to perform text searching and counting pretty well with a simple Ruby, Groovy, or Python script and provide it in the output you want to route directly to the PutEmail processor.

Otherwise, everything you want can be easily done with native processors except counting occurrences of a specific string, but you could use ExecuteStreamCommand with awk to achieve this. You'll just have to spend extra time converting the formats back and forth to be useful.

avatar
Expert Contributor

@Andy LoPresto

Thats a nice idea, but I dont have leverage to user executescript or excecutestreamcommand, as there are no scripts/programs(including awk) waiting for me, also getting them is out of my hands, so looking for a solution with in my flex.

Thank you

avatar
Master Guru
@Mark

I tried your case By using UpdateAttribute's Store the state feature.
flow:

91491-flow.png

1.Two GenerateFlowfiles //to get 
2 flowfiles2.SplitText //split the flowfile into 1 line
3.ExtractText //extract the first value of the from the content
4.RouteOnAttribute //check the extracted value from the flowfile attribute
5.UpdateAttribute //add one to the seq attribute and reset the seq attribute value when it reaches to 10(advance d usage of update attribute processor)
6.RouteOnAttribute //check seq attribute value and send to putemail if seq = 10
7.PutEmail //send mail

I have attached flow template, reuse it and change as per your requirements.

222030-support-update-reset.xml

avatar
Expert Contributor

@Shu

Thankyou