Support Questions

Find answers, ask questions, and share your expertise

*Closed*: Pig SPLIT Syntax error, unexpected symbol

avatar
Contributor

I am using PIG to read a file and want to pass those data and using `SPLIT` I want to split data and below is my input file;

1,aaa,123456,annotation
2,bbb,234567,barber
4,ddd,456789,federal
3,ccc,345678,code
4,ddd,456789,definition
5,asd,545645,AcsToGlRestServices
6,date,58314,filterlevel
7,kssa,22334,timefield
8,Bhi,2236,context

I executed following pig script. Below is my PIG script.

grunt> rawlvl = load '~/file' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray);
grunt> splitlvl = SPLIT rawlvl into one if (no>2 and no<5),two if (no>5); 

But I am getting an exception, kindly help me why I am getting this exception. Below is the exception;

2018-04-12 06:02:19,718 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 3, column 0> Syntax error, unexpected symbol at or near 'splitlvl' Details at logfile: /root/pig_1523510819542.log 

And following below I have pasted `pig_1523510819542.log` file.

================================================================================
Pig Stack Trace
---------------
ERROR 1200: <line 4, column 0>  Syntax error, unexpected symbol at or near 'splitlvl'
Failed to parse: <line 4, column 0>  Syntax error, unexpected symbol at or near 'splitlvl'
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:244)
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1791)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1764)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:707)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1075)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:566)
        at org.apache.pig.Main.main(Main.java:178)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
================================================================================

Kindly help me out here, as I am new to HADOOP and PIG.

1 ACCEPTED SOLUTION

avatar
Master Guru

@JAy PaTel

The issue is with you are storing the split results into splitlvl relation, However by using split function we are splitting out rawlvl relation into one,two relations and then you are keeping the results into splitvl relation.

grunt> splitlvl = SPLIT rawlvl into one if(no>2andno<5),two if(no>5);

Storing split function results into another relation(splitvl), is not a valid syntax for split function in pig

Change your script to

grunt> rawlvl = load '~/file'usingPigStorage(',')as(no:int,name:chararray,phno:int,add:chararray);
grunt> SPLIT rawlvl into one if(no>2andno<5),two if(no>5); grunt> dump one;
grunt> dump two;

For more details about split function please refer to below link.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT

Example:-

Step1:-
Loaded input file into pig

grunt> rawlvl = load '/t.txt' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray);
grunt> dump rawlvl
(1,aaa,123456,annotation) (2,bbb,234567,barber) (4,ddd,456789,federal) (3,ccc,345678,code) (4,ddd,456789,definition) (5,asd,545645,AcsToGlRestServices) (6,date,58314,filterlevel) (7,kssa,22334,timefield) (8,Bhi,2236,context)

data is loaded into rawlvl relation.

Step2:-
Now split rawlvl relation into two relations i.e one,two

grunt> SPLIT rawlvl into one if (no>2 and no<5),two if (no>5);

Dump one relation

grunt> dump one;
(4,ddd,456789,federal)
(3,ccc,345678,code)
(4,ddd,456789,definition)

Dump two relation

grunt> dump two;
(6,date,58314,filterlevel)
(7,kssa,22334,timefield)
(8,Bhi,2236,context)

As you can view the output of one,two relations matching with your conditions specified.

.

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

2 REPLIES 2

avatar
Master Guru

@JAy PaTel

The issue is with you are storing the split results into splitlvl relation, However by using split function we are splitting out rawlvl relation into one,two relations and then you are keeping the results into splitvl relation.

grunt> splitlvl = SPLIT rawlvl into one if(no>2andno<5),two if(no>5);

Storing split function results into another relation(splitvl), is not a valid syntax for split function in pig

Change your script to

grunt> rawlvl = load '~/file'usingPigStorage(',')as(no:int,name:chararray,phno:int,add:chararray);
grunt> SPLIT rawlvl into one if(no>2andno<5),two if(no>5); grunt> dump one;
grunt> dump two;

For more details about split function please refer to below link.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT

Example:-

Step1:-
Loaded input file into pig

grunt> rawlvl = load '/t.txt' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray);
grunt> dump rawlvl
(1,aaa,123456,annotation) (2,bbb,234567,barber) (4,ddd,456789,federal) (3,ccc,345678,code) (4,ddd,456789,definition) (5,asd,545645,AcsToGlRestServices) (6,date,58314,filterlevel) (7,kssa,22334,timefield) (8,Bhi,2236,context)

data is loaded into rawlvl relation.

Step2:-
Now split rawlvl relation into two relations i.e one,two

grunt> SPLIT rawlvl into one if (no>2 and no<5),two if (no>5);

Dump one relation

grunt> dump one;
(4,ddd,456789,federal)
(3,ccc,345678,code)
(4,ddd,456789,definition)

Dump two relation

grunt> dump two;
(6,date,58314,filterlevel)
(7,kssa,22334,timefield)
(8,Bhi,2236,context)

As you can view the output of one,two relations matching with your conditions specified.

.

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Contributor

@Shu Oh! I see. Thak you so much. Those minor mistakes I didn't notice.