Support Questions
Find answers, ask questions, and share your expertise

How to use RouteText to match prefix for each line of the context

Contributor

I want to route each line of the textual context(JSON) individually. So i use RouteText to realize.

The context of the flowfile as bellow:

{"id": 1, "telnum": "11880", "name": "david1"}

{"id": 2, "telnum": "2226666", "name": "david2"}

{"id": 3, "telnum": "222777", "name": "david3"}

{"id": 4, "telnum": "11996", "name": "david4"}

I want the telnum match the prefix "11"(id in 1,4) can be put into a flowfile.

The telnum match the prefix "222"(id in 2,3) can be put into another flowfile.

In the Expression Language Guide, the jsonPath function can be used to evaluate the json value.

I use ${line:jsonPath('$.telnum'):matches('222.*')} , but it can not be compiled correctly. The attribute name of each line may not be "line"

How to set the Properties: Routing Strategy, Matching Strategy and the two Relationship expression for prefix "11" and "222"?

Thanks for help. David.

1 ACCEPTED SOLUTION

Accepted Solutions

Guru

To make this work you need to ensure that the line attribute is populated. In this scenario it looks like you are going to want to use SplitText to create flow files a line at a time. You can then use ExtractJsonPath to pull out the telnum property as an attribute for each line. Use that attribute to either route, or better, UpdateAttribute to ensure it is just the prefix part you want, and use MergeContent with "Correlation Attribute Name" set to the Attribute you're using to group. This will produce a number of bins of combined files, essentially it's a bit like the group by clause in SQL. That will give you FlowFiles containing all the entries for each given prefix.

I would suggest setting a low Max Time on that Merge to avoid introducing additional latency.

View solution in original post

2 REPLIES 2

Guru

To make this work you need to ensure that the line attribute is populated. In this scenario it looks like you are going to want to use SplitText to create flow files a line at a time. You can then use ExtractJsonPath to pull out the telnum property as an attribute for each line. Use that attribute to either route, or better, UpdateAttribute to ensure it is just the prefix part you want, and use MergeContent with "Correlation Attribute Name" set to the Attribute you're using to group. This will produce a number of bins of combined files, essentially it's a bit like the group by clause in SQL. That will give you FlowFiles containing all the entries for each given prefix.

I would suggest setting a low Max Time on that Merge to avoid introducing additional latency.

View solution in original post

Contributor

Thanks for your reply. In your scenario, the flow file need to be splited and then merged.

In Admin Guide: NiFi keeps FlowFile information in memory (the JVM). But during surges of incoming data, NiFi "swaps" the FlowFile information to disk temporarily.

I wonder the split and merge procedure will cost performance additionally or not? If so, I think it is better to route or update each lines of context in one flow file.