Created on 03-25-2018 04:41 AM - edited 09-16-2022 06:01 AM
How do I get the count with the data into an attribute in Nifi? Is it possible to do that? If so pls guide me. TIA
Created 03-25-2018 04:42 AM
@ Matt Burgess
Created 03-25-2018 04:51 AM
Could you please add more details about your question like are you expecting count number of line in the flowfile?
Created 03-25-2018 05:01 AM
What I meant was, I want to convert the content in a flowfile to an attribute.
Created on 03-25-2018 05:28 AM - edited 08-18-2019 12:08 AM
In NiFi we are having Count Text processor which will adds the number of lines,non empty lines,characters in the text file.
Count text processor write Attributes:-
Name | Description |
---|---|
text.line.count | The number of lines of text present in the FlowFile content |
text.line.nonempty.count | The number of lines of text (with at least one non-whitespace character) present in the original FlowFile |
text.word.count | The number of words present in the original FlowFile |
text.character.count | The number of characters (given the specified character encoding) present in the original FlowFile |
Example:-
If you are having content of the flowfile as below and we are having empty line as second line in the flowfile.
Once we feed this content to the Count text processor having below configs:-
Count Lines
true
Count Non-Empty Lines
true
Count Words
true
Count Characters
true
Split Words on Symbols
true
Output Flowfile Attributes:-
count text processor has been added line.count,nonempty lines count, character count to the flowfile.
(or)
By using ExecuteStream command processor we can run wc -l command to get the number of lines in the text document.
(or)
By using query record processor to get lines in the flowfile content
Useful links for Query record processor
https://community.hortonworks.com/articles/140183/counting-lines-in-text-files-with-nifi.html
https://community.hortonworks.com/articles/146096/counting-lines-in-text-files-with-nifi-part-2.html
If you are using QueryDatabase table,execute sql processors then we will have row.count attribute associated with the output flowfile from the which will give the number of rows has been fetched from the source.
To Convert Content as Flowfile Attribute:-
for this use case we can use Extract text processor to extract the content and store as flowfile attribute
Extract text Configs:-
Add new property with the regex (.*) i.e capture all the content and keep the content as flowfile attribute name data.
change the Enable DOTALL Mode to true if your flowfile content having new lines in it.
Most important properties are
Maximum Buffer Size | 1 MB | Specifies the maximum amount of data to buffer (per file) in order to apply the regular expressions. Files larger than the specified maximum will not be fully evaluated. |
Maximum Capture Group Length | 1024 | Specifies the maximum number of characters a given capture group value can have. Any characters beyond the max will be truncated. |
You have to increase these properties values in order of your flowfile size to get all the content of the flow file into attribute.
It's not recommended to extract all the contents and keep them as attributes, as the attributes are kept in-memory.
please refer to below link for nifi best practices and deeper
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#DeeperView
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#best-practice