Reply
Highlighted
Explorer
Posts: 6
Registered: ‎02-15-2016

Multi line regex for Hive Regex Serde

I have been trying to put together a regex to create a Hive external table on a log (text) file, which has some log entries spanning across various lines; but I haven't been successful yet. 

 

Does anyone have any experience with such a requirement? Does Hive Regex Serde support multiple lines?

 

Here's the sample text from the log file I am working with:

 

[2016-12-31T14:30:03.917+00:00] [SomeText] [NOTIFICATION:1] [] [] [ecid: 005HFKbQ7Z_3zyZ8DyW0001DO000CH3,0:1:2:2:1:6] [tid: 514cd700] [13013] Init block, 'Sample Tables', has more variables than the query select list.
[2016-12-31T14:30:03.918+00:00] [SomeText] [NOTIFICATION:1] [] [] [ecid: 005HFKbPIy83zZ8DyW0001DO000CGz,0:1:2:2:1:6] [tid: 50875700] [13013] Init block, 'Sample Tables', has more variables than the query select list.
[2016-12-31T14:56:18.467+00:00] [SomeText] [NOTIFICATION:1] [] [] [ecid: 005HFM4dHTk3z8DyW0001A60020wY,0] [tid: 5fe23700] [59118] Operation Purge Query Plan Cache succeeded!
[2016-12-31T15:00:01.46+00:00] [SomeText] [ERROR:1] [] [] [ecid: 005HFMIa1ln3yZ8DyW0001A600211s,0:6] [tid: 64df7700] [nQSError: 17014] Could not connect to Oracle database. [[
file: server/Query/Execution/DbGateway/Oci8i/../Oci8/Src/SQXOci8.cpp; line: 373
Properties: description=RpScopeVar Exchange; producerID=1405455368; requestID=429096; sessionID=4236224; userName=User;
[nQSError: 17001] Oracle Error code: 12541, message: ORA-12541: TNS:no listener
at OCI call OCILogon.

********** Task: 1. Running for (mls): 82 **********
Description: DB Connect
DSN: TEXT; userName=USER

********** Task: 2. Running for (mls): 82 **********
Description: RpScopeVar GatewayDbGateway Prepare
DSN:Forecasting OLTP Connection Pool;userName:User
SQL:select max(DT) from APPS.TBL where TS is not null and DT <= SYSDATE

********** Task: 3. Running for (mls): 82 **********
Description: Producer::ExecuteOnce()
Activity type:Producer::ExecuteOnce()
Repository Name::Star Subject Area Name:: User Name::User
Logical Hash of SQL:: 0x0
SQL::

]]

================================================================

 

The goal is to write a regex and create Hive table on top of this file. The highlighted text should either be in a single field or broken into two fields - one containing the text before "[[" and 2nd containing everything between "[[" and "]]".

 

Any help would be greatly appreciated.

Announcements