Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Help with Hive Regex extract.

avatar
Super Collaborator

Hi,

i have a firewall log with entries like this..

Mar 12 04:03:01 172.16.3.1 %ASA-6-106100 access-list FW-DATA permitted tcp FW-DATA 172.16.1.4 59289 OUTSIDE 52.87.195.145 22 hit-cnt 1 first hit

i created an external table in hive for this log file and i am trying to use HIVE SQL and regexp_extract to extract column out of this lines by using this regular expression.

([A-Z][a-z]+ \d+\s\d+:\d+:\d+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\%ASA-6-106100):\s+(\w+-\w+\s+\w+-\w+)\s+(\w+)\s+(\w+)\s+(\w+-\w+)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+->\s+(\w+-?\w+?)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+(\w+-\w+\s+\d+\s+\w+\s+\w+)

when i tested this at different sites (like https://regex101.com/) ..it is working fine..breaking this line into many different groups shown below. but when i try to use that in HIVE its not working..any help is appriciated.

Mar 12 04:03:01

172.16.3.1

%ASA-6-106100

access-list FW-DATA

permitted

tcp

FW-DATA

172.16.1.4

59289

OUTSIDE

52.87.195.145

22

hit-cnt 1 first hit

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I think I found the answer..looks like we need to use double slashes in hive..

this is working when I replaced log_entry with the text from above..

select regexp_extract(log_entry, '[A-Z][a-z]+\\s\\d+\\s\\d+:\\d+:\\d+', 0) , regexp_extract(log_entry, '\\d+\\.\\d+\\.\\d\\.\\d', 0) , regexp_extract(log_entry, '%ASA-6-106100', 0) , regexp_extract(log_entry, '\\w+-\\w+\\s+\\w+-\\w+', 0)

View solution in original post

4 REPLIES 4

avatar

@Saikrishna Tarapareddy

Could you help me with your query? Is it throwing an error when you are using it in hive or the data which you were expecting is not replicated by the code?

avatar
Super Collaborator

Hi @Bala Vignesh N V

when I use the following SQL its not returning me anything..i am not getting any error.

select regexp_extract('Mar 12 04:03:01 172.16.3.1 %ASA-6-106100: access-list FW-DATA permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) hit-cnt 1 first hit', '([A-Z][a-z]+ \d+\s\d+:\d+:\d+)\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\%ASA-6-106100):\s+(\w+-\w+\s+\w+-\w+)\s+(\w+)\s+(\w+)\s+(\w+-\w+)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+->\s+(\w+-?\w+?)\/?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})\((\d+)\)\s+(\w+-\w+\s+\d+\s+\w+\s+\w+)',0)

How do I get my input string broken in to multiple columns based on the regex.?

avatar
Master Guru

You can consider to use Hive RegexSerDe, see here for details.

avatar
Super Collaborator

I think I found the answer..looks like we need to use double slashes in hive..

this is working when I replaced log_entry with the text from above..

select regexp_extract(log_entry, '[A-Z][a-z]+\\s\\d+\\s\\d+:\\d+:\\d+', 0) , regexp_extract(log_entry, '\\d+\\.\\d+\\.\\d\\.\\d', 0) , regexp_extract(log_entry, '%ASA-6-106100', 0) , regexp_extract(log_entry, '\\w+-\\w+\\s+\\w+-\\w+', 0)