<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Help with Hive Regex extract. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Help-with-Hive-Regex-extract/m-p/191014#M59099</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;i have a firewall log with entries like this..&lt;/P&gt;&lt;P&gt;Mar 12 04:03:01 
172.16.3.1 
%ASA-6-106100
access-list FW-DATA 
permitted 
tcp 
FW-DATA
172.16.1.4
59289 
OUTSIDE
52.87.195.145
22
hit-cnt 1 first hit&lt;/P&gt;&lt;P&gt;i created an external table in hive for this log file and i am trying to use HIVE SQL and regexp_extract to extract column out of this lines by using this regular expression.&lt;/P&gt;&lt;P&gt;([A-Z][a-z]+ \d+\s\d+:\d+:\d+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\%ASA-6-106100):\s+(\w+-\w+\s+\w+-\w+)\s+(\w+)\s+(\w+)\s+(\w+-\w+)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+-&amp;gt;\s+(\w+-?\w+?)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+(\w+-\w+\s+\d+\s+\w+\s+\w+)&lt;/P&gt;&lt;P&gt;when i tested this at different sites (like &lt;A href="https://regex101.com/)" target="_blank"&gt;https://regex101.com/)&lt;/A&gt; ..it is working fine..breaking this line into many different groups shown below. but when i try to use that in HIVE its not working..any help is appriciated.&lt;/P&gt;&lt;P&gt;Mar 12 04:03:01 &lt;/P&gt;&lt;P&gt;
172.16.3.1 &lt;/P&gt;&lt;P&gt;
%ASA-6-106100 &lt;/P&gt;&lt;P&gt;access-list FW-DATA &lt;/P&gt;&lt;P&gt;
permitted &lt;/P&gt;&lt;P&gt;
tcp &lt;/P&gt;&lt;P&gt;
FW-DATA &lt;/P&gt;&lt;P&gt;172.16.1.4 &lt;/P&gt;&lt;P&gt;59289 &lt;/P&gt;&lt;P&gt;
OUTSIDE &lt;/P&gt;&lt;P&gt;52.87.195.145 &lt;/P&gt;&lt;P&gt;22 &lt;/P&gt;&lt;P&gt;hit-cnt 1 first hit
&lt;/P&gt;</description>
    <pubDate>Thu, 06 Apr 2017 23:38:35 GMT</pubDate>
    <dc:creator>saikrishna_tara</dc:creator>
    <dc:date>2017-04-06T23:38:35Z</dc:date>
    <item>
      <title>Help with Hive Regex extract.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Help-with-Hive-Regex-extract/m-p/191014#M59099</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;i have a firewall log with entries like this..&lt;/P&gt;&lt;P&gt;Mar 12 04:03:01 
172.16.3.1 
%ASA-6-106100
access-list FW-DATA 
permitted 
tcp 
FW-DATA
172.16.1.4
59289 
OUTSIDE
52.87.195.145
22
hit-cnt 1 first hit&lt;/P&gt;&lt;P&gt;i created an external table in hive for this log file and i am trying to use HIVE SQL and regexp_extract to extract column out of this lines by using this regular expression.&lt;/P&gt;&lt;P&gt;([A-Z][a-z]+ \d+\s\d+:\d+:\d+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\%ASA-6-106100):\s+(\w+-\w+\s+\w+-\w+)\s+(\w+)\s+(\w+)\s+(\w+-\w+)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+-&amp;gt;\s+(\w+-?\w+?)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+(\w+-\w+\s+\d+\s+\w+\s+\w+)&lt;/P&gt;&lt;P&gt;when i tested this at different sites (like &lt;A href="https://regex101.com/)" target="_blank"&gt;https://regex101.com/)&lt;/A&gt; ..it is working fine..breaking this line into many different groups shown below. but when i try to use that in HIVE its not working..any help is appriciated.&lt;/P&gt;&lt;P&gt;Mar 12 04:03:01 &lt;/P&gt;&lt;P&gt;
172.16.3.1 &lt;/P&gt;&lt;P&gt;
%ASA-6-106100 &lt;/P&gt;&lt;P&gt;access-list FW-DATA &lt;/P&gt;&lt;P&gt;
permitted &lt;/P&gt;&lt;P&gt;
tcp &lt;/P&gt;&lt;P&gt;
FW-DATA &lt;/P&gt;&lt;P&gt;172.16.1.4 &lt;/P&gt;&lt;P&gt;59289 &lt;/P&gt;&lt;P&gt;
OUTSIDE &lt;/P&gt;&lt;P&gt;52.87.195.145 &lt;/P&gt;&lt;P&gt;22 &lt;/P&gt;&lt;P&gt;hit-cnt 1 first hit
&lt;/P&gt;</description>
      <pubDate>Thu, 06 Apr 2017 23:38:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Help-with-Hive-Regex-extract/m-p/191014#M59099</guid>
      <dc:creator>saikrishna_tara</dc:creator>
      <dc:date>2017-04-06T23:38:35Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Hive Regex extract.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Help-with-Hive-Regex-extract/m-p/191015#M59100</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11732/saikrishnatarapareddy.html" nodeid="11732"&gt;@Saikrishna Tarapareddy&lt;/A&gt; - I think you accidentally asked the same question twice:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/93488/help-with-hive-regex-extract.html" target="_blank"&gt;https://community.hortonworks.com/questions/93488/help-with-hive-regex-extract.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Can you please close this one.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Apr 2017 00:50:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Help-with-Hive-Regex-extract/m-p/191015#M59100</guid>
      <dc:creator>namaheshwari</dc:creator>
      <dc:date>2017-04-07T00:50:53Z</dc:date>
    </item>
  </channel>
</rss>

