Created 04-06-2017 04:30 PM
Hi,
i have a firewall log with entries like this..
Mar 12 04:03:01 172.16.3.1 %ASA-6-106100 access-list FW-DATA permitted tcp FW-DATA 172.16.1.4 59289 OUTSIDE 52.87.195.145 22 hit-cnt 1 first hit
i created an external table in hive for this log file and i am trying to use HIVE SQL and regexp_extract to extract column out of this lines by using this regular expression.
([A-Z][a-z]+ \d+\s\d+:\d+:\d+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\%ASA-6-106100):\s+(\w+-\w+\s+\w+-\w+)\s+(\w+)\s+(\w+)\s+(\w+-\w+)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+->\s+(\w+-?\w+?)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+(\w+-\w+\s+\d+\s+\w+\s+\w+)
when i tested this at different sites (like https://regex101.com/) ..it is working fine..breaking this line into many different groups shown below. but when i try to use that in HIVE its not working..any help is appriciated.
Mar 12 04:03:01
172.16.3.1
%ASA-6-106100
access-list FW-DATA
permitted
tcp
FW-DATA
172.16.1.4
59289
OUTSIDE
52.87.195.145
22
hit-cnt 1 first hit
Created 04-06-2017 09:54 PM
I think I found the answer..looks like we need to use double slashes in hive..
this is working when I replaced log_entry with the text from above..
select regexp_extract(log_entry, '[A-Z][a-z]+\\s\\d+\\s\\d+:\\d+:\\d+', 0) , regexp_extract(log_entry, '\\d+\\.\\d+\\.\\d\\.\\d', 0) , regexp_extract(log_entry, '%ASA-6-106100', 0) , regexp_extract(log_entry, '\\w+-\\w+\\s+\\w+-\\w+', 0)
Created 04-06-2017 07:18 PM
Could you help me with your query? Is it throwing an error when you are using it in hive or the data which you were expecting is not replicated by the code?
Created 04-06-2017 08:53 PM
when I use the following SQL its not returning me anything..i am not getting any error.
select regexp_extract('Mar 12 04:03:01 172.16.3.1 %ASA-6-106100: access-list FW-DATA permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) hit-cnt 1 first hit', '([A-Z][a-z]+ \d+\s\d+:\d+:\d+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\%ASA-6-106100):\s+(\w+-\w+\s+\w+-\w+)\s+(\w+)\s+(\w+)\s+(\w+-\w+)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+->\s+(\w+-?\w+?)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+(\w+-\w+\s+\d+\s+\w+\s+\w+)',0)
How do I get my input string broken in to multiple columns based on the regex.?
Created 04-06-2017 09:49 PM
You can consider to use Hive RegexSerDe, see here for details.
Created 04-06-2017 09:54 PM
I think I found the answer..looks like we need to use double slashes in hive..
this is working when I replaced log_entry with the text from above..
select regexp_extract(log_entry, '[A-Z][a-z]+\\s\\d+\\s\\d+:\\d+:\\d+', 0) , regexp_extract(log_entry, '\\d+\\.\\d+\\.\\d\\.\\d', 0) , regexp_extract(log_entry, '%ASA-6-106100', 0) , regexp_extract(log_entry, '\\w+-\\w+\\s+\\w+-\\w+', 0)