Support Questions

Find answers, ask questions, and share your expertise

Pig Error : ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing

avatar
Expert Contributor

Hi All,

While trying to process my data in pig which is a csv dataset from here Link I'm getting the below error .There is some delimitter problem here in the file.If i create the same file manually i'm able to see the data is getting loaded properly.

Pig Script:

A = LOAD 's3a://byr-heor-test/dev1/BJsales.csv' using PigStorage(',') as (Num:Int,time:int,BJsales:int)

Output:

..
..
(149,149,262)
(150,150,262)
2016-12-27 09:31:35,632 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <PATH> "2 "" at line 3, column 8.
Was expecting one of:


1 ACCEPTED SOLUTION

avatar
Master Mentor

@Vaibhav Kumar

recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv.

grunt> register /usr/hdp/current/pig-client/piggybank.jar;
grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float);
grunt> describe a;
a: {Num: int,time: int,BJsales: float}
grunt> b = limit a 5;
grunt> dump b;

output

(1,1,200.1)
(2,2,199.5)
(3,3,199.4)
(4,4,198.9)
(5,5,199.0)

notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.

View solution in original post

4 REPLIES 4

avatar

looking at the BJsales.csv file it seems the first column is string type. Make sure to use proper datatypes. Also remove any empty rows are end of file.

avatar
Expert Contributor

Every Field is a Integer or float here so i gave int to all.

avatar
Super Collaborator

To add to @milind pandit, tried opening the AirPassengers file. The first column is enclosed in quotes. This is the same for BJsales.csv as well.

10833-hcc.png

avatar
Master Mentor

@Vaibhav Kumar

recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv.

grunt> register /usr/hdp/current/pig-client/piggybank.jar;
grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float);
grunt> describe a;
a: {Num: int,time: int,BJsales: float}
grunt> b = limit a 5;
grunt> dump b;

output

(1,1,200.1)
(2,2,199.5)
(3,3,199.4)
(4,4,198.9)
(5,5,199.0)

notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.