Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig --> Header first column is not showing

avatar
Expert Contributor

Hi,

I wrote a small pig script to load simple data set to relation and dumping data on to console. When I dump it, it's not showing the first column name (header). Is there anything wrong with my script?

Data Set:

e_id,fname,mname,lname,age,gender,address,city,state,zip 1,John,m,Smith,35,M,,Princeton,NJ,08536 2,James,S,Clark,M,,Princeton,NJ,08536

Script:

A= LOAD '/user/cloudera/sat/data/sampledata1.csv' using PigStorage(',') as (e_id:int,fname:chararray,mname:chararray,lname:chararray,age:chararray,gender:chararray,address:chararray,city:chararray,state:chararray,zip:int); dump A;

Output:

(,fname,mname,lname,age,gender,address,city,state,) (1,John,m,Smith,35,M,,Princeton,NJ,8536) (2,James,S,Clark,M,,Princeton,NJ,08536,)

1 ACCEPTED SOLUTION

avatar

Hi @Satish S

In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.

View solution in original post

2 REPLIES 2

avatar
Guru

You have defined the first field as an int. Your data fields are ints, so you see them. But your header is a chararray and pig throws this casting error (string to int) by simply returning empty character.

If you use Piggybank, you can skip the header: http://stackoverflow.com/questions/29335656/hadoop-pig-removing-csv-header

avatar

Hi @Satish S

In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.