Support Questions

pateljay · ‎04-12-2018

I am using PIG to read a file and want to pass those data and using `SPLIT` I want to split data and below is my input file;

1,aaa,123456,annotation
2,bbb,234567,barber
4,ddd,456789,federal
3,ccc,345678,code
4,ddd,456789,definition
5,asd,545645,AcsToGlRestServices
6,date,58314,filterlevel
7,kssa,22334,timefield
8,Bhi,2236,context

I executed following pig script. Below is my PIG script.

grunt> rawlvl = load '~/file' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray);
grunt> splitlvl = SPLIT rawlvl into one if (no>2 and no<5),two if (no>5);

But I am getting an exception, kindly help me why I am getting this exception. Below is the exception;

2018-04-12 06:02:19,718 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 3, column 0> Syntax error, unexpected symbol at or near 'splitlvl' Details at logfile: /root/pig_1523510819542.log

And following below I have pasted `pig_1523510819542.log` file.

================================================================================
Pig Stack Trace
---------------
ERROR 1200: <line 4, column 0>  Syntax error, unexpected symbol at or near 'splitlvl'
Failed to parse: <line 4, column 0>  Syntax error, unexpected symbol at or near 'splitlvl'
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:244)
        at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1791)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1764)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:707)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1075)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:566)
        at org.apache.pig.Main.main(Main.java:178)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
================================================================================

Kindly help me out here, as I am new to HADOOP and PIG.

Shu_ashu · ‎04-12-2018

@JAy PaTel

The issue is with you are storing the split results into splitlvl relation, However by using split function we are splitting out rawlvl relation into one,two relations and then you are keeping the results into splitvl relation.

grunt> splitlvl = SPLIT rawlvl into one if(no>2andno<5),two if(no>5);

Storing split function results into another relation(splitvl), is not a valid syntax for split function in pig

Change your script to

grunt> rawlvl = load '~/file'usingPigStorage(',')as(no:int,name:chararray,phno:int,add:chararray);
grunt> SPLIT rawlvl into one if(no>2andno<5),two if(no>5);
grunt> dump one;
grunt> dump two;

For more details about split function please refer to below link.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT

Example:-

Step1:-
Loaded input file into pig

grunt> rawlvl = load '/t.txt' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray);
grunt> dump rawlvl
(1,aaa,123456,annotation)
(2,bbb,234567,barber)
(4,ddd,456789,federal)
(3,ccc,345678,code)
(4,ddd,456789,definition)
(5,asd,545645,AcsToGlRestServices)
(6,date,58314,filterlevel)
(7,kssa,22334,timefield)
(8,Bhi,2236,context)

data is loaded into rawlvl relation.

Step2:-
Now split rawlvl relation into two relations i.e one,two

grunt> SPLIT rawlvl into one if (no>2 and no<5),two if (no>5);

Dump one relation

grunt> dump one;
(4,ddd,456789,federal)
(3,ccc,345678,code)
(4,ddd,456789,definition)

Dump two relation

grunt> dump two;
(6,date,58314,filterlevel)
(7,kssa,22334,timefield)
(8,Bhi,2236,context)

As you can view the output of one,two relations matching with your conditions specified.

.

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

View solution in original post

Shu_ashu · ‎04-12-2018

@JAy PaTel

The issue is with you are storing the split results into splitlvl relation, However by using split function we are splitting out rawlvl relation into one,two relations and then you are keeping the results into splitvl relation.

grunt> splitlvl = SPLIT rawlvl into one if(no>2andno<5),two if(no>5);

Storing split function results into another relation(splitvl), is not a valid syntax for split function in pig

Change your script to

grunt> rawlvl = load '~/file'usingPigStorage(',')as(no:int,name:chararray,phno:int,add:chararray);
grunt> SPLIT rawlvl into one if(no>2andno<5),two if(no>5);
grunt> dump one;
grunt> dump two;

For more details about split function please refer to below link.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT

Example:-

Step1:-
Loaded input file into pig

grunt> rawlvl = load '/t.txt' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray);
grunt> dump rawlvl
(1,aaa,123456,annotation)
(2,bbb,234567,barber)
(4,ddd,456789,federal)
(3,ccc,345678,code)
(4,ddd,456789,definition)
(5,asd,545645,AcsToGlRestServices)
(6,date,58314,filterlevel)
(7,kssa,22334,timefield)
(8,Bhi,2236,context)

data is loaded into rawlvl relation.

Step2:-
Now split rawlvl relation into two relations i.e one,two

grunt> SPLIT rawlvl into one if (no>2 and no<5),two if (no>5);

Dump one relation

grunt> dump one;
(4,ddd,456789,federal)
(3,ccc,345678,code)
(4,ddd,456789,definition)

Dump two relation

grunt> dump two;
(6,date,58314,filterlevel)
(7,kssa,22334,timefield)
(8,Bhi,2236,context)

As you can view the output of one,two relations matching with your conditions specified.

.

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

pateljay · ‎04-12-2018

@Shu Oh! I see. Thak you so much. Those minor mistakes I didn't notice.

Cloudera Community

Support Questions

Closed: Pig SPLIT Syntax error, unexpected symbol

Apache Pig IN operator, placeholder until PIG-4931...

Error closing output stream Warning

JOLT TRANSFORMATION returns unexpected result

How Region Split works in HBase.

Unable to execute hbase job due to Unexpected clo...

HUE symbols -eye

CDP Private Cloud 7.16 : HUE MYSQL 8 connection er...

XML Processing: Encoding, Validation, Parsing & Sp...

Error Trying to get Basic Pig Syntax Running

DataNode give syntax error: unexpected end of file...

Support Questions

*Closed*: Pig SPLIT Syntax error, unexpected symbol

Closed: Pig SPLIT Syntax error, unexpected symbol