- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive RegEx finding second pattern
- Labels:
-
Apache Hive
Created on ‎04-12-2017 09:13 PM - edited ‎08-17-2019 11:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
I have line entries in my HDFS file as these , i created an external table on this file.
Mar 12 04:03:01 172.16.3.1 %ASA-6-106100: access-list FW-DATA permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) hit-cnt 1 first hit [0xe37d63e0, 0x0]
i am trying break this into different columns using the HiveQL below.
select regexp_extract(log_entry, '[A-Z][a-z]+\s\d+\s\d+:\d+:\d+', 0) ,
regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) ,
regexp_extract(log_entry, '%ASA-6-106100', 0) ,
regexp_extract(log_entry, '\w+-\w+\s+\w+-\w+', 0) ,
regexp_extract(log_entry, 'permitted|denied', 0) ,
regexp_extract(log_entry, 'tcp|udp', 0) ,
regexp_extract(log_entry, '[A-Z]+-[A-Z]+', 0) ,
regexp_extract(log_entry, '\d+\.\d+\.\d\.\d', 0) from temp_firewall_log
values for _c1 and _c7 are same , showing the first pattern it found. but how can i show second pattern in _c7.?
and getting output like below..
Created ‎04-17-2017 06:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
any idea on how to do this.?
Created ‎04-17-2017 07:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Saikrishna Tarapareddy you mean to say that 172.16.1.4 should be generated?. The other way of doing it is split the data and then use regexp on the second string you got from split.
Example: SPLIT('hive:hadoop',':') returns ["hive","hadoop"]
Created ‎04-18-2017 11:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
select regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 1) ip1, regexp_extract(' permitted tcp FW-DATA/172.16.1.4(59289) -> OUTSIDE/52.87.195.145(22) ', '^.*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?(\\d+\\.\\d+\\.\\d+\\.\\d+).*?$', 2) ip2;
Created ‎04-18-2017 11:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a longer regex, assumed the log_entry meets 2 ip address displayed.
