Support Questions

kumarvaibhav199 · ‎12-27-2016

Hi All,

While trying to process my data in pig which is a csv dataset from here Link I'm getting the below error .There is some delimitter problem here in the file.If i create the same file manually i'm able to see the data is getting loaded properly.

Pig Script:

A = LOAD 's3a://byr-heor-test/dev1/BJsales.csv' using PigStorage(',') as (Num:Int,time:int,BJsales:int)

Output:

..
..
(149,149,262)
(150,150,262)
2016-12-27 09:31:35,632 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " <PATH> "2 "" at line 3, column 8.
Was expecting one of:

aervits · ‎02-01-2017

@Vaibhav Kumar

recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv.

grunt> register /usr/hdp/current/pig-client/piggybank.jar;
grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float);
grunt> describe a;
a: {Num: int,time: int,BJsales: float}
grunt> b = limit a 5;
grunt> dump b;

output

(1,1,200.1)
(2,2,199.5)
(3,3,199.4)
(4,4,198.9)
(5,5,199.0)

notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.

View solution in original post

mpandit · ‎12-27-2016

looking at the BJsales.csv file it seems the first column is string type. Make sure to use proper datatypes. Also remove any empty rows are end of file.

kumarvaibhav199 · ‎12-27-2016

Every Field is a Integer or float here so i gave int to all.

arunak · ‎12-27-2016

To add to @milind pandit, tried opening the AirPassengers file. The first column is enclosed in quotes. This is the same for BJsales.csv as well.

aervits · ‎02-01-2017

@Vaibhav Kumar

recommendations from my colleagues are valid, you have strings in header row of your CSV documents. You can certainly filter by some known entity but there's a more advanced version of CSV Pig Loader called CSVExcelStorage. It is part of Piggybank library that comes bundled with HDP, hence the register command. You can pass different control parameters to it. Mortar blog is an excellent source of information on working with Pig http://help.mortardata.com/technologies/pig/csv.

grunt> register /usr/hdp/current/pig-client/piggybank.jar;
grunt> a = load 'BJsales.csv' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (Num:Int,time:int,BJsales:float);
grunt> describe a;
a: {Num: int,time: int,BJsales: float}
grunt> b = limit a 5;
grunt> dump b;

output

(1,1,200.1)
(2,2,199.5)
(3,3,199.4)
(4,4,198.9)
(5,5,199.0)

notice I am not filtering any relation, I'm telling the loader to skip header outright, it saves a few key strokes and doesn't waste any cycles processing anything extra.

Cloudera Community

Support Questions

Pig Error : ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing