Created 05-04-2017 04:28 PM
Is there a way to use the Nifi Expression Language instead of traditional Regex to get the timestamp from an event of this format?
2017-05-04 14:43:17,302 foo bar foo bar
I would assume I'd need something that could find the second white space, take everything before it and save it as an attribute and then I could easily substring that attribute into its parts.
Created on 05-04-2017 04:44 PM - edited 08-17-2019 07:21 PM
The NiFi expression language was written specifically for working with NiFi attributes. You would first need to use ExtractText processor to get the bits from your content moved into NiFi FlowFile Attributes:
Add a new property to your ExtractText procesor configured as follows:
Note that my Regex above has a white space at the end.
This regex will result in multiple new FlowFile Attributes being created for you:
So no need to followup with any substring NiFi expression language manipulation commands.
Thanks,
Matt
Created 05-04-2017 04:36 PM
You can only perform EL on flow file attributes, not on the content of a flow file. So you would first have to use ExtractText to extract the whole content of the flow file into an attribute (assuming that example message is the content of a flow file). This should only be done if you are certain the size of the messages is reasonable and can fit in memory.
Once you have a flow file attribute you can apply any of the EL functions described here:
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
Created 05-04-2017 04:40 PM
So you are saying there is no way to extract a Timestamp from the content of a flow file without using Regex, correct?
Created 05-04-2017 04:44 PM
Well the options are...
1) ExtractText with regex
2) ExtractText to get whole content into attribute then use EL
3) A Groovy/Jyton/etc script that ExecuteScript can call to parse your data
4) Custom Java processor that knows how to parse your data format
Created 05-04-2017 06:18 PM
Thanks. Using an external script seems worse in regards to processing time than regex would be and while custom Java processor seems appealing, I don't believe thats the direction we wish to go.
Created on 05-04-2017 04:44 PM - edited 08-17-2019 07:21 PM
The NiFi expression language was written specifically for working with NiFi attributes. You would first need to use ExtractText processor to get the bits from your content moved into NiFi FlowFile Attributes:
Add a new property to your ExtractText procesor configured as follows:
Note that my Regex above has a white space at the end.
This regex will result in multiple new FlowFile Attributes being created for you:
So no need to followup with any substring NiFi expression language manipulation commands.
Thanks,
Matt
Created 05-04-2017 04:55 PM
Thanks Matt a better solution but still using regex I guess. Im guessing its impossible to get the timestamp from the log without a regex expression - it seems your regex is better than mine.
I will still need to do a followup of substring Nifi expression Language because I need the month, day, year, hour, etc saved into different attributes to be able to correctly form my file's filenames and the directory in HDFS.
Created on 05-04-2017 05:19 PM - edited 08-17-2019 07:20 PM
You can still use ExtractText to get all the bits broken out at once by adding multiple new properties:
Thanks,
Matt