Created on 12-19-2019 08:10 PM - last edited on 12-19-2019 08:13 PM by Robert Justice
Hi all,
New in NiFi. Hence need guidance on achieving the desired result.
Scenario:
1. Multiple .txt log files
2. each .txt log file contains many lines
Requirement:
1. Read each .txt log file and extract only those lines that has "Three.Link resp:". Below snippet for example is from abc.txt
09/10/18 20:06:07.581 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:07.963 34, 0, 0, 0, 0, -99
09/10/18 20:06:08.591 DEBUG: (Radio) Two.Link request
09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:08.601 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:08.983 34, 0, 0, 0, 0, -99
09/10/18 20:06:09.600 DEBUG: (Radio) Two.Link request
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) One.Link DONE
What I am trying to do is to extract only those lines from abc.txt which contains "Three.Link resp:" and write it to another file containing only those lines as shown below:
09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99
I used the ExtractText processor with regex expression :
^.*Three.Link resp.*$
which works correctly. Please refer to regex.com wherein the regex expression seems to work to extract the entire line from the text:
<a href="https://regex101.com/r/Ggtl74/2" target="_blank">https://regex101.com/r/Ggtl74/2</a>
But when I place the same regex expression in ExtractText processor, this does not work at all.
Can anyone please advise how to achieve this?? Why does NiFi processor does seem to apply the regex expression or am I not understanding something here??
Thanks in advance.
Created 12-24-2019 03:39 AM
Extract text is for getting some text from the content and putting it in an attribute. This does not sound like what you want. Also it will match the regex to the whole flowfile so again probably not what you want.
If you only want to keep certain lines from a flowfile, the processor to use seems to be RouteText.
Here is an example of this: https://community.cloudera.com/t5/Support-Questions/Filtering-records-from-a-file-using-NiFi/td-p/18...
Created 12-24-2019 03:39 AM
Extract text is for getting some text from the content and putting it in an attribute. This does not sound like what you want. Also it will match the regex to the whole flowfile so again probably not what you want.
If you only want to keep certain lines from a flowfile, the processor to use seems to be RouteText.
Here is an example of this: https://community.cloudera.com/t5/Support-Questions/Filtering-records-from-a-file-using-NiFi/td-p/18...
Created 12-24-2019 10:12 AM
@DennisJaheruddi ...This definitely helps and I tested it and seems like it did extracted the requisite lines. I created the below regex expression and applied it to RouteText processor and as of now it seems to be working.
\d{0,2}\/\d{0,2}\/\d{0,2}\s\d{0,2}\:\d{0,2}\:\d{0,2}\.\d{0,4}[ \t]+DEBUG\:[ \t]+\(Radio\)\sThree\.Link\sresp\:[ \t]+-?[\d]{0,4}\s[A-Za-z]{0,4}\-\d{0,7}[ \t]+\d{0,6}\s\d{0,2}-?[A-Za-z]{0,3}\s\d{0,2}\:\d{0,2}\:\d{0,2}[ \t]+\d{0,6}\s\d{0,4}\.\d{0,4}\s\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}\s-?\d{0,6}$
I would still like to test the flow in detail before I mark your solution as the Accepted solution. I would really appreciate your patience for the same.
Will keep you posted.
Cheers,
Created 12-24-2019 04:43 AM
This is a very basic use case scenario for NiFi. I would recommend that once you get the file into NiFi you split it line by line. Once you have the log file splits, then you do the match logic on each single line. Route the lines you want down stream and handle them accordingly. There are many ways to do this, and the fun part of NiFi is discovering what works best for you.
Here is a NiFi Template I have that checks log files:
https://github.com/steven-dfheinz/NiFi-Templates/blob/master/Get_File_Demo.xml
If this answers helps solve your issue, please make it as Accepted Solution.
Created 12-24-2019 10:16 AM
@stevenmatison .... I will be surely taking the template from your github and test it out as well. Do give me sometime to test this method as well. Appreciate your advise a lot!!!
Cheers,
Created 12-25-2019 08:45 AM
Just a heads up:
Splitting the file into individual records may provide additional flexibility, but if the case is straightforward enough, I do think it is recommended to use processors (like route text) that avoid creating a flow file for each line.
Created 12-25-2019 11:03 PM
@DennisJaheruddi ....Thanks much for making the Christmas more merrier 🙂 I agree to your statement and have configured the flow accordingly. I am marking your reply as accepted solution. Great advise and kudos to you again.