Support Questions

Find answers, ask questions, and share your expertise

Search value in the ReplaceText of NiFi does not parse regex being passed through an attribute

avatar
Explorer

The following works great in the NiFi ReplaceText Processor

Flowfile Content:

US0706003247984600Z1Z000123371K
US0706003247984600Z1Z000125491K
US0706003247984600Z1Z000125596K

Search Value:

(.{2})(?:.{4})(.{6})(.{2})(.{4})(.{6})(.{6})(.{1})

Replacement Value:

{col_foo1:$1,col_foo3:$2,col_foo4:$3,col_foo5:$4,col_foo6:$5,col_foo7:$6,col_foo8:$7},

Output:

{col_foo1:US,col_foo3:003247,col_foo4:98,col_foo5:4600,col_foo6:Z1Z000,col_foo7:123371,col_foo8:K}
{col_foo1:US,col_foo3:003247,col_foo4:98,col_foo5:4600,col_foo6:Z1Z000,col_foo7:125491,col_foo8:K},
{col_foo1:US,col_foo3:003247,col_foo4:98,col_foo5:4600,col_foo6:Z1Z000,col_foo7:125596,col_foo8:K},

however, I need to store the Search Value in an Attribute (e.g. search.value) and the Replacement Value in an Attribute (e.g. replace.value), which will be passed in a via a configuration file.

Flowfile Content:

US0706003247984600Z1Z000123371K
US0706003247984600Z1Z000125491K
US0706003247984600Z1Z000125596K

Search Value:

${search.value}

search.value Attribute:

(.{2})(?:.{4})(.{6})(.{2})(.{4})(.{6})(.{6})(.{1})

Replacement Value:

${replacement.value}

replacement.value Attribute:

{col_foo1:$1,col_foo3:$2,col_foo4:$3,col_foo5:$4,col_foo6:$5,col_foo7:$6,col_foo8:$7},

Output:

US0706003247984600Z1Z000123371K US0706003247984600Z1Z000125491K
US0706003247984600Z1Z000125596K  

which appears to indicate that the regex content of each of the Attribute values is not being evaluated properly.

Any ideas are greatly appreciated.

3 REPLIES 3

avatar
@Michael Vogt

The documentation of the ReplaceText processor is a little confusing. Using attributes doesn't work that way for this processor. It treats them literally, it doesn't use the value of the attribute in the properties. That is why the resulting output isn't the same.

avatar

Hi @Michael Vogt,

To greatly simply regular expressions for fixed-width files, you can use the language Grok. The processor “ExtractGrok” can be used to pull out fixed-length values for example:

https://groups.google.com/forum/#!topic/logstash-users/7FETqn3PB1M

Using the following data:

Time Sequence Source Destination Action Data

---------- -------- ------------ ------------ ------------------------------ ------------------------------------------------------------------------------------ Start 2013/04/29 00:01:34

> 00:01:34 Yosemite Daily Rollover

? 02:18:56 02185130 Yosemite bioWatch Trak Alert WS Failed Return=Serial Not Found.

? 02:19:03 Yosemite AlertNotify ERROR: Conversion from string "" to type 'Date' is not valid.

* 02:19:03 Yosemite AlertNotify Failed Serial=L1234567890 Setting=AUTOREPORT

I want to be able to get the Time, Sequence, Source, Destination, Action and Data from the data (fixed length above). Writing regular expressions can be difficult, therefore Grok was created for simplification.

I built the following workflow using:

  • 1) GetFile – fetch the file (with the data above)
  • 2) SplitText – I split the file up into 1 flowfile per line
  • 3) ExtractGrok – I use a Grok expression to pull out Time (grok.time attribute), Sequence (grok.sequence attribute), Source (grok.source attribute), Destination (grok.destination attribute), Action (grok.action attribute) and Data (grok.data attribute).

39468-image001.png

My Grok pattern:

(?<severity>.{1}) (?<time>.{8}) (?<sequence>.{8}) (?<source>.{12}) (?<destination>.{12}) (?<action>.{30}) %{GREEDYDATA:data}

If you look at the data above, there are a total of 6 lines – where 5 lines match my Grok pattern. I likely wouldn’t want to collect the unmatched flowfiles because there will always be an unmatched pattern if the file contains “---------- -------- ------------ ------------ ------------------------------ ------------------------------------------------------------------------------------ Start 2013/04/29 00:01:34”.

39467-image002.png

The Grok Pattern file (is attached). I used one I found on google – that had a bunch of pre-defined regular expressions.

Grok will output my attributes as I define them in my Grok Expression, where each FlowFile will associate a group with my specified attribute:

39466-image003.png

avatar
Explorer

Thanks Ryan.

Can you please verify the following would work?

The value of the FlowFileAttribute grok.expression is

(?<severity>.{1}) (?<time>.{8}) (?<sequence>.{8}) (?<source>.{12}) (?<destination>.{12}) (?<action>.{30}) %{GREEDYDATA:data}

Within Configure Processor of the ExtractGrok Processor, the value of Grok Expression is

${grok.expression}

The expected behavior is that the ExtractGrok Processor would continue to work as though the Grok Expression were hardcoded with (?<severity>.{1}) (?<time>.{8}) (?<sequence>.{8}) (?<source>.{12}) (?<destination>.{12}) (?<action>.{30}) %{GREEDYDATA:data}