Support Questions
Find answers, ask questions, and share your expertise

How to process the SMS log data

Highlighted

How to process the SMS log data

Contributor

Hi Friends,

I have a data like below from log files

2017-06-04 08:01:08 Receive asd [SMSC:voda2] [SVC:usrCellfind].

Here Field Delimiter is [ ] and inside delimiter, the fields are like <<column name>>:<<value>>. Could you please let me know how to process these kinds of log file

3 REPLIES 3
Highlighted

Re: How to process the SMS log data

Super Collaborator

Depends on what you need to do with the fields,


Since the input follows a regular pattern, you could apply a regex matching to extract the required entities.

A = load 'file' using PigStorage() as (line:chararray);<br>B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2})\\s(.*?)\\s(.*?)\\s\\[(.*?)\\].*?\\[(.*?)\\]')) as (c1:chararray,c2:chararray,c3:chararray,c4:chararray,c5:chararray);<br>dump B;<br>(2017-06-04 08:01:08,Receive,asd,SMSC:voda2,SVC:usrCellfind) 

If you can show an expected parsed result, may be we can attack this problem in a better way.


EDIT



Based on the required output, I modified the REGEX as below

B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2})\\s(.*?)\\s(.*?)\\s.*?\\[.*?:(.*?)\\].*?\\:(.*?)\\]')) as (c1:chararray,c2:chararray,c3:chararray,c4:chararray,c5:chararray);

Also, this isn't restricted to PIG. If you need to use hive, the same Regex Groups would help. Hive supports Regex Serde. There is an excellent article on the same.

Highlighted

Re: How to process the SMS log data

Contributor

Getting mismatched input 'as' expecting RIGHT_PAREN

and also I want to get output as

  1. (2017-06-0408:01:08,Receive,asd,voda2,usrCellfind)
Highlighted

Re: How to process the SMS log data

Contributor

Thanks Arun for your help