Support Questions

Find answers, ask questions, and share your expertise

how to read file content and extract specific lines in nifi from .txt log files

avatar
Explorer

Hi all,

 

New in NiFi. Hence need guidance on achieving the desired result.

Scenario:

1. Multiple .txt log files 

2. each .txt log file contains many lines 

 

Requirement:

1. Read each .txt log file and extract only those lines that has "Three.Link resp:". Below snippet for example is from abc.txt

 

 

09/10/18 20:06:07.581 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:07.963 34, 0, 0, 0, 0, -99
09/10/18 20:06:08.591 DEBUG: (Radio) Two.Link request
09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:08.601 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:08.983 34, 0, 0, 0, 0, -99
09/10/18 20:06:09.600 DEBUG: (Radio) Two.Link request
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) One.Link DONE

 

 

What I am trying to do is to extract only those lines from abc.txt which contains "Three.Link resp:" and write it to another file containing only those lines as shown below:

 

 

 

09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99

 

 

 

I used the ExtractText processor with regex expression : 

 

 

^.*Three.Link resp.*$

 

 

which works correctly. Please refer to regex.com wherein the regex expression seems to work to extract the entire line from the text:

 

 

 

<a href="https://regex101.com/r/Ggtl74/2" target="_blank">https://regex101.com/r/Ggtl74/2</a>

 

 

But when I place the same regex expression in ExtractText processor, this does not work at all.

 

Can anyone please advise how to achieve this?? Why does NiFi processor does seem to apply the regex expression or am I not understanding something here??

 

Thanks in advance.

 

 

1 ACCEPTED SOLUTION

avatar

Extract text is for getting some text  from the content and putting it in an attribute. This does not sound like what you want. Also it will match the regex to the whole flowfile so again probably not what you want.

 

If you only want to keep certain lines from a flowfile, the processor to use seems to be RouteText.

 

Here is an example of this: https://community.cloudera.com/t5/Support-Questions/Filtering-records-from-a-file-using-NiFi/td-p/18...


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

View solution in original post

6 REPLIES 6

avatar

Extract text is for getting some text  from the content and putting it in an attribute. This does not sound like what you want. Also it will match the regex to the whole flowfile so again probably not what you want.

 

If you only want to keep certain lines from a flowfile, the processor to use seems to be RouteText.

 

Here is an example of this: https://community.cloudera.com/t5/Support-Questions/Filtering-records-from-a-file-using-NiFi/td-p/18...


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

avatar
Explorer

@DennisJaheruddi ...This definitely helps and I tested it and seems like it did extracted the requisite lines. I created the below regex expression and applied it to RouteText processor and as of now it seems to be working.

\d{0,2}\/\d{0,2}\/\d{0,2}\s\d{0,2}\:\d{0,2}\:\d{0,2}\.\d{0,4}[ \t]+DEBUG\:[ \t]+\(Radio\)\sThree\.Link\sresp\:[ \t]+-?[\d]{0,4}\s[A-Za-z]{0,4}\-\d{0,7}[ \t]+\d{0,6}\s\d{0,2}-?[A-Za-z]{0,3}\s\d{0,2}\:\d{0,2}\:\d{0,2}[ \t]+\d{0,6}\s\d{0,4}\.\d{0,4}\s\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}\s-?\d{0,6}$

 I would still like to test the flow in detail before I mark your solution as the Accepted solution. I would really appreciate your patience for the same.

Will keep you posted.

Cheers,

avatar
Super Guru

This is a very basic use case scenario for NiFi.     I would recommend that once you get the file into NiFi you split it line by line.  Once you have the log file splits, then you do the match logic on each single line.   Route the lines you want down stream and handle them accordingly.   There are many ways to do this, and the fun part of NiFi is discovering what works best for you.

 

Here is a NiFi Template I have that checks log files:

 

https://github.com/steven-dfheinz/NiFi-Templates/blob/master/Get_File_Demo.xml

 

If this answers helps solve your issue, please make it as Accepted Solution.

avatar
Explorer

@stevenmatison .... I will be surely taking the template from your github and test it out as well. Do give me sometime to test this method as well. Appreciate your advise a lot!!!

 

Cheers,

avatar

Just a heads up: 

Splitting the file into individual records may provide additional flexibility, but if the case is straightforward enough, I do think it is recommended to use processors (like route text) that avoid creating a flow file for each line. 


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

avatar
Explorer

@DennisJaheruddi ....Thanks much for making the Christmas more merrier 🙂 I agree to your statement and have configured the flow accordingly. I am marking your reply as accepted solution. Great advise and kudos to you again.