Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to process the SMS log data

How to process the SMS log data


Hi Friends,

I have a data like below from log files

2017-06-04 08:01:08 Receive asd [SMSC:voda2] [SVC:usrCellfind].

Here Field Delimiter is [ ] and inside delimiter, the fields are like <<column name>>:<<value>>. Could you please let me know how to process these kinds of log file


Re: How to process the SMS log data

Super Collaborator

Depends on what you need to do with the fields,

Since the input follows a regular pattern, you could apply a regex matching to extract the required entities.

A = load 'file' using PigStorage() as (line:chararray);<br>B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2})\\s(.*?)\\s(.*?)\\s\\[(.*?)\\].*?\\[(.*?)\\]')) as (c1:chararray,c2:chararray,c3:chararray,c4:chararray,c5:chararray);<br>dump B;<br>(2017-06-04 08:01:08,Receive,asd,SMSC:voda2,SVC:usrCellfind) 

If you can show an expected parsed result, may be we can attack this problem in a better way.


Based on the required output, I modified the REGEX as below

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2})\\s(.*?)\\s(.*?)\\s.*?\\[.*?:(.*?)\\].*?\\:(.*?)\\]')) as (c1:chararray,c2:chararray,c3:chararray,c4:chararray,c5:chararray);

Also, this isn't restricted to PIG. If you need to use hive, the same Regex Groups would help. Hive supports Regex Serde. There is an excellent article on the same.

Re: How to process the SMS log data


Getting mismatched input 'as' expecting RIGHT_PAREN

and also I want to get output as

  1. (2017-06-0408:01:08,Receive,asd,voda2,usrCellfind)

Re: How to process the SMS log data


Thanks Arun for your help