Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to read file content and extract specific lines in nifi from .txt log files

Solved Go to solution

how to read file content and extract specific lines in nifi from .txt log files

Explorer

Hi all,

 

New in NiFi. Hence need guidance on achieving the desired result.

Scenario:

1. Multiple .txt log files 

2. each .txt log file contains many lines 

 

Requirement:

1. Read each .txt log file and extract only those lines that has "Three.Link resp:". Below snippet for example is from abc.txt

 

 

09/10/18 20:06:07.581 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:07.963 34, 0, 0, 0, 0, -99
09/10/18 20:06:08.591 DEBUG: (Radio) Two.Link request
09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:08.601 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:08.983 34, 0, 0, 0, 0, -99
09/10/18 20:06:09.600 DEBUG: (Radio) Two.Link request
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) One.Link DONE

 

 

What I am trying to do is to extract only those lines from abc.txt which contains "Three.Link resp:" and write it to another file containing only those lines as shown below:

 

 

 

09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99

 

 

 

I used the ExtractText processor with regex expression : 

 

 

^.*Three.Link resp.*$

 

 

which works correctly. Please refer to regex.com wherein the regex expression seems to work to extract the entire line from the text:

 

 

 

<a href="https://regex101.com/r/Ggtl74/2" target="_blank">https://regex101.com/r/Ggtl74/2</a>

 

 

But when I place the same regex expression in ExtractText processor, this does not work at all.

 

Can anyone please advise how to achieve this?? Why does NiFi processor does seem to apply the regex expression or am I not understanding something here??

 

Thanks in advance.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

Expert Contributor

Extract text is for getting some text  from the content and putting it in an attribute. This does not sound like what you want. Also it will match the regex to the whole flowfile so again probably not what you want.

 

If you only want to keep certain lines from a flowfile, the processor to use seems to be RouteText.

 

Here is an example of this: https://community.cloudera.com/t5/Support-Questions/Filtering-records-from-a-file-using-NiFi/td-p/18...


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl

View solution in original post

6 REPLIES 6
Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

Expert Contributor

Extract text is for getting some text  from the content and putting it in an attribute. This does not sound like what you want. Also it will match the regex to the whole flowfile so again probably not what you want.

 

If you only want to keep certain lines from a flowfile, the processor to use seems to be RouteText.

 

Here is an example of this: https://community.cloudera.com/t5/Support-Questions/Filtering-records-from-a-file-using-NiFi/td-p/18...


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl

View solution in original post

Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

Explorer

@DennisJaheruddi ...This definitely helps and I tested it and seems like it did extracted the requisite lines. I created the below regex expression and applied it to RouteText processor and as of now it seems to be working.

\d{0,2}\/\d{0,2}\/\d{0,2}\s\d{0,2}\:\d{0,2}\:\d{0,2}\.\d{0,4}[ \t]+DEBUG\:[ \t]+\(Radio\)\sThree\.Link\sresp\:[ \t]+-?[\d]{0,4}\s[A-Za-z]{0,4}\-\d{0,7}[ \t]+\d{0,6}\s\d{0,2}-?[A-Za-z]{0,3}\s\d{0,2}\:\d{0,2}\:\d{0,2}[ \t]+\d{0,6}\s\d{0,4}\.\d{0,4}\s\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}\s-?\d{0,6}$

 I would still like to test the flow in detail before I mark your solution as the Accepted solution. I would really appreciate your patience for the same.

Will keep you posted.

Cheers,

Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

This is a very basic use case scenario for NiFi.     I would recommend that once you get the file into NiFi you split it line by line.  Once you have the log file splits, then you do the match logic on each single line.   Route the lines you want down stream and handle them accordingly.   There are many ways to do this, and the fun part of NiFi is discovering what works best for you.

 

Here is a NiFi Template I have that checks log files:

 

https://github.com/steven-dfheinz/NiFi-Templates/blob/master/Get_File_Demo.xml

 

If this answers helps solve your issue, please make it as Accepted Solution.

 


 


If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  


 


Thanks,



Steven

Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

Explorer

@stevenmatison .... I will be surely taking the template from your github and test it out as well. Do give me sometime to test this method as well. Appreciate your advise a lot!!!

 

Cheers,

Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

Expert Contributor

Just a heads up: 

Splitting the file into individual records may provide additional flexibility, but if the case is straightforward enough, I do think it is recommended to use processors (like route text) that avoid creating a flow file for each line. 


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'. Also check out my techincal portfolio at https://portfolio.jaheruddin.nl
Highlighted

Re: how to read file content and extract specific lines in nifi from .txt log files

Explorer

@DennisJaheruddi ....Thanks much for making the Christmas more merrier I agree to your statement and have configured the flow accordingly. I am marking your reply as accepted solution. Great advise and kudos to you again. 

Don't have an account?
Coming from Hortonworks? Activate your account here