Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi Extract timestamp from log without using Regex

avatar
Expert Contributor

Is there a way to use the Nifi Expression Language instead of traditional Regex to get the timestamp from an event of this format?

2017-05-04 14:43:17,302 foo bar foo bar

I would assume I'd need something that could find the second white space, take everything before it and save it as an attribute and then I could easily substring that attribute into its parts.

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Eric Lloyd

The NiFi expression language was written specifically for working with NiFi attributes. You would first need to use ExtractText processor to get the bits from your content moved into NiFi FlowFile Attributes:

Add a new property to your ExtractText procesor configured as follows:

15059-screen-shot-2017-05-04-at-124108-pm.png

Note that my Regex above has a white space at the end.

This regex will result in multiple new FlowFile Attributes being created for you:

15060-screen-shot-2017-05-04-at-124238-pm.png

So no need to followup with any substring NiFi expression language manipulation commands.

Thanks,

Matt

View solution in original post

7 REPLIES 7

avatar
Master Guru

You can only perform EL on flow file attributes, not on the content of a flow file. So you would first have to use ExtractText to extract the whole content of the flow file into an attribute (assuming that example message is the content of a flow file). This should only be done if you are certain the size of the messages is reasonable and can fit in memory.

Once you have a flow file attribute you can apply any of the EL functions described here:

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html

avatar
Expert Contributor

So you are saying there is no way to extract a Timestamp from the content of a flow file without using Regex, correct?

avatar
Master Guru

Well the options are...

1) ExtractText with regex

2) ExtractText to get whole content into attribute then use EL

3) A Groovy/Jyton/etc script that ExecuteScript can call to parse your data

4) Custom Java processor that knows how to parse your data format

avatar
Expert Contributor

Thanks. Using an external script seems worse in regards to processing time than regex would be and while custom Java processor seems appealing, I don't believe thats the direction we wish to go.

avatar
Super Mentor

@Eric Lloyd

The NiFi expression language was written specifically for working with NiFi attributes. You would first need to use ExtractText processor to get the bits from your content moved into NiFi FlowFile Attributes:

Add a new property to your ExtractText procesor configured as follows:

15059-screen-shot-2017-05-04-at-124108-pm.png

Note that my Regex above has a white space at the end.

This regex will result in multiple new FlowFile Attributes being created for you:

15060-screen-shot-2017-05-04-at-124238-pm.png

So no need to followup with any substring NiFi expression language manipulation commands.

Thanks,

Matt

avatar
Expert Contributor

Thanks Matt a better solution but still using regex I guess. Im guessing its impossible to get the timestamp from the log without a regex expression - it seems your regex is better than mine.

I will still need to do a followup of substring Nifi expression Language because I need the month, day, year, hour, etc saved into different attributes to be able to correctly form my file's filenames and the directory in HDFS.

avatar
Super Mentor

@Eric Lloyd

You can still use ExtractText to get all the bits broken out at once by adding multiple new properties:

15062-screen-shot-2017-05-04-at-11636-pm.png

Thanks,

Matt