- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Unable to extract status code from Web Server Logs in Nifi
- Labels:
-
Apache NiFi
Created ‎10-10-2016 05:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am using Nifi to extract attributes like IP, timestamp, request type, and status code from the web server logs. This is the sample of my data:
133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566
Using regex in ExtractText Processor to do this operation. I am getting IP, timestamp and request type but not able to extract status code which is 200 in this case. Using (\\d{3}) right now but it is not working. Has anyone tried out this before?
Created ‎10-10-2016 07:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm assuming that you are using multiple capture groups to extract each piece of information. Can you explain what "it is not working" looks like in your situation? Is it capturing nothing, capturing different values than you expected, or throwing an exception? One possibility is that your expression is not focused enough -- if that is the complete expression, it would capture "133" first (as well as "199" and "040" before getting to "200"). If you know the log format will remain consistent, you might want to try something like
HTTP\/\d\.\d" (\d{3})
. Please let us know if you have any more information and if this solves your problem.
Update: I tested this expression and was able to get the following output:
-------------------------------------------------- Standard FlowFile Attributes Key: 'entryDate' Value: 'Mon Oct 10 12:18:27 PDT 2016' Key: 'lineageStartDate' Value: 'Mon Oct 10 12:18:27 PDT 2016' Key: 'fileSize' Value: '115' FlowFile Attribute Map Content Key: 'HTTP response' Value: '200' Key: 'HTTP response.0' Value: 'HTTP/1.0" 200' Key: 'HTTP response.1' Value: '200' Key: 'filename' Value: '787130965602970' Key: 'path' Value: './' Key: 'uuid' Value: 'ccb6f333-de33-4037-9a1a-aa9ce7f2ef32' -------------------------------------------------- 133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566
I uploaded the template I used here: ExtractText Regex Template.
Created ‎10-10-2016 07:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm assuming that you are using multiple capture groups to extract each piece of information. Can you explain what "it is not working" looks like in your situation? Is it capturing nothing, capturing different values than you expected, or throwing an exception? One possibility is that your expression is not focused enough -- if that is the complete expression, it would capture "133" first (as well as "199" and "040" before getting to "200"). If you know the log format will remain consistent, you might want to try something like
HTTP\/\d\.\d" (\d{3})
. Please let us know if you have any more information and if this solves your problem.
Update: I tested this expression and was able to get the following output:
-------------------------------------------------- Standard FlowFile Attributes Key: 'entryDate' Value: 'Mon Oct 10 12:18:27 PDT 2016' Key: 'lineageStartDate' Value: 'Mon Oct 10 12:18:27 PDT 2016' Key: 'fileSize' Value: '115' FlowFile Attribute Map Content Key: 'HTTP response' Value: '200' Key: 'HTTP response.0' Value: 'HTTP/1.0" 200' Key: 'HTTP response.1' Value: '200' Key: 'filename' Value: '787130965602970' Key: 'path' Value: './' Key: 'uuid' Value: 'ccb6f333-de33-4037-9a1a-aa9ce7f2ef32' -------------------------------------------------- 133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566
I uploaded the template I used here: ExtractText Regex Template.
Created ‎10-10-2016 09:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much @Andy LoPresto, it worked. It was capturing nothing earlier, perhaps because of other 3 digit numbers. The log format is consistent throughout the file, so yeah, the workflow flowed like a water 🙂
