Support Questions

saikrishna_tara · ‎04-12-2017

Hi ,

I have line entries in my HDFS file as these , i created an external table on this file.

Mar 12 04:03:01 172.16.3.1 %ASA-6-106100: access-list FW-DATA permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) hit-cnt 1 first hit [0xe37d63e0, 0x0]

i am trying break this into different columns using the HiveQL below.

select regexp_extract(log_entry, '[A-Z][a-z]+\s\d+\s\d+:\d+:\d+', 0) ,

regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) ,

regexp_extract(log_entry, '%ASA-6-106100', 0) ,

regexp_extract(log_entry, '\w+-\w+\s+\w+-\w+', 0) ,

regexp_extract(log_entry, 'permitted|denied', 0) ,

regexp_extract(log_entry, 'tcp|udp', 0) ,

regexp_extract(log_entry, '[A-Z]+-[A-Z]+', 0) ,

regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) from temp_firewall_log

values for _c1 and _c7 are same , showing the first pattern it found. but how can i show second pattern in _c7.?

and getting output like below..

saikrishna_tara · ‎04-17-2017

any idea on how to do this.?

balavignesh_nag · ‎04-17-2017

@Saikrishna Tarapareddy you mean to say that 172.16.1.4 should be generated?. The other way of doing it is split the data and then use regexp on the second string you got from split.

Example: SPLIT('hive:hadoop',':') returns ["hive","hadoop"]

yangjy · ‎04-18-2017

select 
regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 1) ip1, 
regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 2) ip2;

yangjy · ‎04-18-2017

This is a longer regex, assumed the log_entry meets 2 ip address displayed.

Cloudera Community

Support Questions

Hive RegEx finding second pattern