Created 07-20-2016 03:59 AM
I want to route each line of the textual context(JSON) individually. So i use RouteText to realize.
The context of the flowfile as bellow:
{"id": 1, "telnum": "11880", "name": "david1"}
{"id": 2, "telnum": "2226666", "name": "david2"}
{"id": 3, "telnum": "222777", "name": "david3"}
{"id": 4, "telnum": "11996", "name": "david4"}
I want the telnum match the prefix "11"(id in 1,4) can be put into a flowfile.
The telnum match the prefix "222"(id in 2,3) can be put into another flowfile.
In the Expression Language Guide, the jsonPath function can be used to evaluate the json value.
I use ${line:jsonPath('$.telnum'):matches('222.*')} , but it can not be compiled correctly. The attribute name of each line may not be "line"
How to set the Properties: Routing Strategy, Matching Strategy and the two Relationship expression for prefix "11" and "222"?
Thanks for help. David.
Created 07-22-2016 09:50 AM
To make this work you need to ensure that the line attribute is populated. In this scenario it looks like you are going to want to use SplitText to create flow files a line at a time. You can then use ExtractJsonPath to pull out the telnum property as an attribute for each line. Use that attribute to either route, or better, UpdateAttribute to ensure it is just the prefix part you want, and use MergeContent with "Correlation Attribute Name" set to the Attribute you're using to group. This will produce a number of bins of combined files, essentially it's a bit like the group by clause in SQL. That will give you FlowFiles containing all the entries for each given prefix.
I would suggest setting a low Max Time on that Merge to avoid introducing additional latency.
Created 07-22-2016 09:50 AM
To make this work you need to ensure that the line attribute is populated. In this scenario it looks like you are going to want to use SplitText to create flow files a line at a time. You can then use ExtractJsonPath to pull out the telnum property as an attribute for each line. Use that attribute to either route, or better, UpdateAttribute to ensure it is just the prefix part you want, and use MergeContent with "Correlation Attribute Name" set to the Attribute you're using to group. This will produce a number of bins of combined files, essentially it's a bit like the group by clause in SQL. That will give you FlowFiles containing all the entries for each given prefix.
I would suggest setting a low Max Time on that Merge to avoid introducing additional latency.
Created 07-25-2016 01:45 AM
Thanks for your reply. In your scenario, the flow file need to be splited and then merged.
In Admin Guide: NiFi keeps FlowFile information in memory (the JVM). But during surges of incoming data, NiFi "swaps" the FlowFile information to disk temporarily.
I wonder the split and merge procedure will cost performance additionally or not? If so, I think it is better to route or update each lines of context in one flow file.