Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hi, how to parse this log file with pig ?

hi, how to parse this log file with pig ?

New Contributor

03:00:00,685 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [31e432d4-6a89-4828-9c24-0f1d596eed23][10.40.26.49][WEB_AUTHENTICATE] started 03:00:00,703 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [31e432d4-6a89-4828-9c24-0f1d596eed23][10.40.26.49][WEB_AUTHENTICATE] executed in 18 ms 03:00:00,898 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [88898a09-0664-4a77-bc53-3d428712e4ef][10.40.26.49][sessionKill] started 03:00:00,947 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [88898a09-0664-4a77-bc53-3d428712e4ef][10.40.26.49][sessionKill] executed in 49 ms

2 REPLIES 2

Re: hi, how to parse this log file with pig ?

Super Collaborator
Highlighted

Re: hi, how to parse this log file with pig ?

Expert Contributor

It depends on what you're trying to do, but perhaps the first thing you want to do is tokenize each line by the space delimiter:

-- sample2.txt
-- 03:00:00,685 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [31e432d4-6a89-4828-9c24-0f1d596eed23][10.40.26.49][WEB_AUTHENTICATE] started
-- 03:00:00,703 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [31e432d4-6a89-4828-9c24-0f1d596eed23][10.40.26.49][WEB_AUTHENTICATE] executed in 18 ms
-- 03:00:00,898 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [88898a09-0664-4a77-bc53-3d428712e4ef][10.40.26.49][sessionKill] started
--  03:00:00,947 INFO [tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter] (http-/0.0.0.0:8080-1) [88898a09-0664-4a77-bc53-3d428712e4ef][10.40.26.49][sessionKill] executed in 49 ms


A = LOAD '/user/admin/sample2.txt' AS (line:chararray);
X = FOREACH A GENERATE TOKENIZE(line, ' ');
DUMP X;


-- results
-- ({(03:00:00,685),(INFO),([tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter]),((http-/0.0.0.0:8080-1)),([31e432d4-6a89-4828-9c24-0f1d596eed23][10.40.26.49][WEB_AUTHENTICATE]),(started)})
-- ({(03:00:00,703),(INFO),([tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter]),((http-/0.0.0.0:8080-1)),([31e432d4-6a89-4828-9c24-0f1d596eed23][10.40.26.49][WEB_AUTHENTICATE]),(executed),(in),(18),(ms)})
-- ({(03:00:00,898),(INFO),([tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter]),((http-/0.0.0.0:8080-1)),([88898a09-0664-4a77-bc53-3d428712e4ef][10.40.26.49][sessionKill]),(started)})
-- ({(03:00:00,947),(INFO),([tr.com.anadolubank.gm.server.ANDDefaultServiceExecuter]),((http-/0.0.0.0:8080-1)),([88898a09-0664-4a77-bc53-3d428712e4ef][10.40.26.49][sessionKill]),(executed),(in),(49),(ms)})
Don't have an account?
Coming from Hortonworks? Activate your account here