Member since
12-19-2019
9
Posts
0
Kudos Received
0
Solutions
01-13-2021
07:08 PM
@MattWho Thanks so much for going extra mile and sharing the template file. Appreciate it very much. However, the problem is still there. I totally agree with you that the number of occurrence's found in the one input file should not split the flow files into that many times but trust me I feel like the server is haunted and is acting weird. Can you test one more time in your test flow by making one small change to see if you can replicate the issue? - Instead of using generate flow file processor, please copy the data from the regex site and put it inside a text file (*.txt) - Delete the generate flow file processor and add in "GetFile" processor - point the Input directory to the path wherein you saved that *.txt file with the content taken from regex site - Connect the "GetFile" processor to the "Route Text" processor with the setting which was previously set i.e. Routing Strategy: Route to each matching Property Name and the Matching Strategy: Matches Regular Expression. - Please use any of the regex expression shared before - Run the flow Can you please let me know if you still have one flowfile to the matched relationship? If you still say no and you get one flow file as output to matched relationship then I am sure my server in office is haunted. I might have to write a python script and put it in executestream command processor to extract the text from file and put it in a location. Thats the workaround coming to my mind right now. Thanks so much again Matt for all the help.
... View more
01-13-2021
07:01 AM
@MattWho Hi Matt, Many thanks for sharing your views. Please see my replies below to your questions: - It may be helpful if you shared your RouteText processor configuration. --> I have the exact same configuration as stated in your image. But instead of "lines", I gave the attribute name as "matched". Thats the only difference. (Pardon me for not able to share the actual image as the system in which I did was the actual server not having the internet connection) - Correct me if I am wrong, but you are looking to have all lines (minus the header lines) placed in a new FlowFile by themselves. --> Yes - The above would result in a FlowFile with only lines 0,1, and 16. The header plus lines 2 and 15 would route to unmatched because of the leading "*" which does not match your regex. --> Yes, thats correct. It would omit lines 2 and 15 but I need them as well. But if you see, the line * 2 abc-rr1-0 35185 20-Dec 03:43:54 Has a space after " * " and then the digit 2. But for the line: *15 abc-lr2-0 34686 20-Dec 03:43:54 there is no space after the " * " and then comes the digit 15. Hence to include this, I reconfigured the regex expression again as shared below: (\s|[ \t]|\*)\d{1,2}\s\w{1,4}\-\w{1,4}\-(\w{1,2}|\w{0})\s\d{1,5}\s\d{1,2}\-\w{1,3}\s\d{1,2}\:\d{1,2}\:\d{1,2} Please see the below link for the regex I made (not sure if you like it 😞 but I tried) https://regex101.com/r/pdo6Ca/1 2. I noticed you sample data has leading and trailing whitespace so make sure processor is configured to ignore those. --> I reworked on the regex again whose link is above to include/ignore those(https://regex101.com/r/pdo6Ca/1 ) Please share your views if this regex is wrong. 3. Since you intent is produce a new FlowFile with only the lines matching the regex, make sure you set the above Routing Strategy. --> Yes, it is set exactly as stated in your image but instead of creating one flow file with all matched lines, it creates multiple flow files with each matched lines printed in each same name flow file....multiple times, that too, sent to "unmatched" relationship 4. Make sure the correct matching strategy is selected. Should be what I have above. --> Yes, it is set exactly as stated in your image 5. Click on the "+" to add a new dynamic property for your regex, The property name becomes a new relationship on the processor where your matching lines will be routed. --> Yes, correct 6. Since you are evaluating the source FlowFile content line-by-line, make sure your regex does not have a line return at the end of it. --> I had double checked and no extra return is there After all this, the issue I am having is that, when the flow file is sent to this processor, it is not sending any flow file values to the "lines" relationship or "matched" relationship (name I set in my processor). Instead, there happens two things: - It creates same flow file name (e.g. abc.txt) multiple times towards the "unmatched" relationship. So if the regex expression matches 7 lines from the input file coming from the previous processor, then it would create 7 abc.txt flow files routed towards "unmatched" relationship wherein each of those 7 files will contain 1 matched line each (different values captured I meant) - It creates 1 flow file sent to "original" relationship. It would be great if you can share your advise on the below aspect of mine: - It creates same flow file name (e.g. abc.txt) multiple times towards the "unmatched" relationship. So if the regex expression matches 7 lines from the input file coming from the previous processor, then it would create 7 abc.txt flow files routed towards "unmatched" relationship wherein each of those 7 files will contain 1 line each (different values captured I meant) Thank you again for your time and advise. Regards,
... View more
01-12-2021
04:10 PM
Hi , I have multiple incoming files containing several lines of text of which I need certain lines from each of those files to create a table in the next flow file stages. However, the regex expression I wrote using regex101.com clearly tells me that the regex works but when I put the same regex in the nifi processor - RouteText processor, it fails to extract any part of it. I don't understand if the regex is wrong or the flow is wrong. Can some one please advise me on the same?? Below are some of the lines from the text file which I need to be extracted which start from the number 0 till 16 and put it in another text file containing only those lines (from 0 to 16) with the same file name from which it read the data. wwwwww aa cc
# Name foo Since ddd/www dddd
-- --------- ----- --------------- --- --- ---------
0 abc-lr1-0 35189 20-Dec 03:43:54
1 abc-rr2-g 35209 20-Dec 03:43:54
* 2 abc-rr1-0 35185 20-Dec 03:43:54
*15 abc-lr2-0 34686 20-Dec 03:43:54
16 abc-lr1-0 34631 20-Dec 03:43:54 Below is the regex expression I wrote: \d{0,2}\sabc-\w{0,2}\d{0,2}-\d{0,2}\w{0,2}\s\d{0,6}\s\d{0,2}-\w{0,3}\s\d{0,2}\:\d{0,2}\:\d{0,2} Somehow the Route Text processor isn't able to recognise the regex. Any help is appreciable here. Thanks in advance
... View more
Labels:
- Labels:
-
Apache NiFi
12-25-2019
11:03 PM
@DennisJaheruddi ....Thanks much for making the Christmas more merrier 🙂 I agree to your statement and have configured the flow accordingly. I am marking your reply as accepted solution. Great advise and kudos to you again.
... View more
12-24-2019
10:16 AM
@stevenmatison .... I will be surely taking the template from your github and test it out as well. Do give me sometime to test this method as well. Appreciate your advise a lot!!! Cheers,
... View more
12-24-2019
10:12 AM
@DennisJaheruddi ...This definitely helps and I tested it and seems like it did extracted the requisite lines. I created the below regex expression and applied it to RouteText processor and as of now it seems to be working. \d{0,2}\/\d{0,2}\/\d{0,2}\s\d{0,2}\:\d{0,2}\:\d{0,2}\.\d{0,4}[ \t]+DEBUG\:[ \t]+\(Radio\)\sThree\.Link\sresp\:[ \t]+-?[\d]{0,4}\s[A-Za-z]{0,4}\-\d{0,7}[ \t]+\d{0,6}\s\d{0,2}-?[A-Za-z]{0,3}\s\d{0,2}\:\d{0,2}\:\d{0,2}[ \t]+\d{0,6}\s\d{0,4}\.\d{0,4}\s\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}\s-?\d{0,4}\.\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}[ \t]+-?\d{0,4}\s-?\d{0,6}$ I would still like to test the flow in detail before I mark your solution as the Accepted solution. I would really appreciate your patience for the same. Will keep you posted. Cheers,
... View more
12-19-2019
08:10 PM
Hi all,
New in NiFi. Hence need guidance on achieving the desired result.
Scenario:
1. Multiple .txt log files
2. each .txt log file contains many lines
Requirement:
1. Read each .txt log file and extract only those lines that has "Three.Link resp:". Below snippet for example is from abc.txt
09/10/18 20:06:07.581 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:07.963 34, 0, 0, 0, 0, -99
09/10/18 20:06:08.591 DEBUG: (Radio) Two.Link request
09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:08.601 DEBUG: (Radio) One.Link DONE
09/10/18 20:06:08.983 34, 0, 0, 0, 0, -99
09/10/18 20:06:09.600 DEBUG: (Radio) Two.Link request
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) One.Link DONE
What I am trying to do is to extract only those lines from abc.txt which contains "Three.Link resp:" and write it to another file containing only those lines as shown below:
09/10/18 20:06:08.601 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:02 0 .00 .00 .00 .00 0 0 0 -99 -99
09/10/18 20:06:09.611 DEBUG: (Radio) Three.Link resp: -1 abc-34664 0 10-Sep 12:06:03 0 .00 .00 .00 .00 0 0 0 -99 -99
I used the ExtractText processor with regex expression :
^.*Three.Link resp.*$
which works correctly. Please refer to regex.com wherein the regex expression seems to work to extract the entire line from the text:
<a href="https://regex101.com/r/Ggtl74/2" target="_blank">https://regex101.com/r/Ggtl74/2</a>
But when I place the same regex expression in ExtractText processor, this does not work at all.
Can anyone please advise how to achieve this?? Why does NiFi processor does seem to apply the regex expression or am I not understanding something here??
Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi