Created 05-14-2017 12:23 PM
Hi,
I wrote a small pig script to load simple data set to relation and dumping data on to console. When I dump it, it's not showing the first column name (header). Is there anything wrong with my script?
Data Set:
e_id,fname,mname,lname,age,gender,address,city,state,zip 1,John,m,Smith,35,M,,Princeton,NJ,08536 2,James,S,Clark,M,,Princeton,NJ,08536
Script:
A= LOAD '/user/cloudera/sat/data/sampledata1.csv' using PigStorage(',') as (e_id:int,fname:chararray,mname:chararray,lname:chararray,age:chararray,gender:chararray,address:chararray,city:chararray,state:chararray,zip:int); dump A;
Output:
(,fname,mname,lname,age,gender,address,city,state,) (1,John,m,Smith,35,M,,Princeton,NJ,8536) (2,James,S,Clark,M,,Princeton,NJ,08536,)
Created 05-14-2017 01:43 PM
Hi @Satish S
In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.
Created 05-14-2017 01:38 PM
You have defined the first field as an int. Your data fields are ints, so you see them. But your header is a chararray and pig throws this casting error (string to int) by simply returning empty character.
If you use Piggybank, you can skip the header: http://stackoverflow.com/questions/29335656/hadoop-pig-removing-csv-header
Created 05-14-2017 01:43 PM
Hi @Satish S
In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.