<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Unable to extract status code from Web Server Logs in Nifi in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122824#M85577</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I am using Nifi to extract attributes like IP, timestamp, request type, and status code from the web server logs. This is the sample of my data:&lt;/P&gt;&lt;P&gt;133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566&lt;/P&gt;&lt;P&gt;Using regex in ExtractText Processor to do this operation. I am getting IP, timestamp and request type but not able to extract status code which is 200 in this case. Using (\\d{3}) right now but it is not working. Has anyone tried out this before?&lt;/P&gt;</description>
    <pubDate>Tue, 11 Oct 2016 00:00:40 GMT</pubDate>
    <dc:creator>mrizvi</dc:creator>
    <dc:date>2016-10-11T00:00:40Z</dc:date>
    <item>
      <title>Unable to extract status code from Web Server Logs in Nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122824#M85577</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I am using Nifi to extract attributes like IP, timestamp, request type, and status code from the web server logs. This is the sample of my data:&lt;/P&gt;&lt;P&gt;133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566&lt;/P&gt;&lt;P&gt;Using regex in ExtractText Processor to do this operation. I am getting IP, timestamp and request type but not able to extract status code which is 200 in this case. Using (\\d{3}) right now but it is not working. Has anyone tried out this before?&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 00:00:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122824#M85577</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-10-11T00:00:40Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to extract status code from Web Server Logs in Nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122825#M85578</link>
      <description>&lt;P&gt;
	Hi,&lt;/P&gt;&lt;P&gt;
	I'm assuming that you are using multiple capture groups to extract each piece of information. Can you explain what "it is not working" looks like in your situation? Is it capturing nothing, capturing different values than you expected, or throwing an exception? One possibility is that your expression is not focused enough -- if that is the complete expression, it would capture "133" first (as well as "199" and "040" before getting to "200"). If you know the log format will remain consistent, you might want to try something like 
	&lt;CODE&gt;HTTP\/\d\.\d" (\d{3})&lt;/CODE&gt;. Please let us know if you have any more information and if this solves your problem.&lt;/P&gt;&lt;P&gt;
	Update: I tested this expression and was able to get the following output:&lt;/P&gt;
&lt;PRE&gt;--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
	Value: 'Mon Oct 10 12:18:27 PDT 2016'
Key: 'lineageStartDate'
	Value: 'Mon Oct 10 12:18:27 PDT 2016'
Key: 'fileSize'
	Value: '115'
FlowFile Attribute Map Content
Key: 'HTTP response'
	Value: '200'
Key: 'HTTP response.0'
	Value: 'HTTP/1.0" 200'
Key: 'HTTP response.1'
	Value: '200'
Key: 'filename'
	Value: '787130965602970'
Key: 'path'
	Value: './'
Key: 'uuid'
	Value: 'ccb6f333-de33-4037-9a1a-aa9ce7f2ef32'
--------------------------------------------------
133.43.96.45 - - [01/Aug/1995:00:00:16 -0400] "GET /shuttle/missions/sts-69/mission-sts-69.html HTTP/1.0" 200 10566
&lt;/PRE&gt;&lt;P&gt;I uploaded the template I used here: &lt;A target="_blank" href="https://gist.github.com/alopresto/f79604a8f0e803defac54b048d7a6b4d"&gt;ExtractText Regex Template&lt;/A&gt;. &lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 02:11:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122825#M85578</guid>
      <dc:creator>alopresto</dc:creator>
      <dc:date>2016-10-11T02:11:43Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to extract status code from Web Server Logs in Nifi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122826#M85579</link>
      <description>&lt;P&gt;Thank you so much &lt;A rel="user" href="https://community.cloudera.com/users/595/alopresto.html" nodeid="595"&gt;@Andy LoPresto&lt;/A&gt;, it worked. It was capturing nothing earlier, perhaps because of other 3 digit numbers. The log format is consistent throughout the file, so yeah, the workflow flowed like a water &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2016 04:16:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-extract-status-code-from-Web-Server-Logs-in-Nifi/m-p/122826#M85579</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-10-11T04:16:51Z</dc:date>
    </item>
  </channel>
</rss>

