Support Questions

Find answers, ask questions, and share your expertise

Hive RegEx finding second pattern

avatar
Super Collaborator

Hi ,

I have line entries in my HDFS file as these , i created an external table on this file.

Mar 12 04:03:01 172.16.3.1 %ASA-6-106100: access-list FW-DATA permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) hit-cnt 1 first hit [0xe37d63e0, 0x0]

i am trying break this into different columns using the HiveQL below.

select regexp_extract(log_entry, '[A-Z][a-z]+\s\d+\s\d+:\d+:\d+', 0) ,

regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) ,

regexp_extract(log_entry, '%ASA-6-106100', 0) ,

regexp_extract(log_entry, '\w+-\w+\s+\w+-\w+', 0) ,

regexp_extract(log_entry, 'permitted|denied', 0) ,

regexp_extract(log_entry, 'tcp|udp', 0) ,

regexp_extract(log_entry, '[A-Z]+-[A-Z]+', 0) ,

regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) from temp_firewall_log

values for _c1 and _c7 are same , showing the first pattern it found. but how can i show second pattern in _c7.?

and getting output like below..

14588-regexo.jpg

4 REPLIES 4

avatar
Super Collaborator

any idea on how to do this.?

avatar

@Saikrishna Tarapareddy you mean to say that 172.16.1.4 should be generated?. The other way of doing it is split the data and then use regexp on the second string you got from split.

Example: SPLIT('hive:hadoop',':') returns ["hive","hadoop"]

avatar
New Contributor
select 
regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 1) ip1, 
regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 2) ip2;

avatar
New Contributor

This is a longer regex, assumed the log_entry meets 2 ip address displayed.