Created on 04-12-2017 09:13 PM - edited 08-17-2019 11:15 PM
Hi ,
I have line entries in my HDFS file as these , i created an external table on this file.
Mar 12 04:03:01 172.16.3.1 %ASA-6-106100: access-list FW-DATA permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) hit-cnt 1 first hit [0xe37d63e0, 0x0]
i am trying break this into different columns using the HiveQL below.
select regexp_extract(log_entry, '[A-Z][a-z]+\s\d+\s\d+:\d+:\d+', 0) ,
regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) ,
regexp_extract(log_entry, '%ASA-6-106100', 0) ,
regexp_extract(log_entry, '\w+-\w+\s+\w+-\w+', 0) ,
regexp_extract(log_entry, 'permitted|denied', 0) ,
regexp_extract(log_entry, 'tcp|udp', 0) ,
regexp_extract(log_entry, '[A-Z]+-[A-Z]+', 0) ,
regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) from temp_firewall_log
values for _c1 and _c7 are same , showing the first pattern it found. but how can i show second pattern in _c7.?
and getting output like below..
Created 04-17-2017 06:52 PM
any idea on how to do this.?
Created 04-17-2017 07:21 PM
@Saikrishna Tarapareddy you mean to say that 172.16.1.4 should be generated?. The other way of doing it is split the data and then use regexp on the second string you got from split.
Example: SPLIT('hive:hadoop',':') returns ["hive","hadoop"]
Created 04-18-2017 11:30 AM
select regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 1) ip1, regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 2) ip2;
Created 04-18-2017 11:30 AM
This is a longer regex, assumed the log_entry meets 2 ip address displayed.