Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig --> Header first column is not showing

avatar
Expert Contributor

Hi,

I wrote a small pig script to load simple data set to relation and dumping data on to console. When I dump it, it's not showing the first column name (header). Is there anything wrong with my script?

Data Set:

e_id,fname,mname,lname,age,gender,address,city,state,zip 1,John,m,Smith,35,M,,Princeton,NJ,08536 2,James,S,Clark,M,,Princeton,NJ,08536

Script:

A= LOAD '/user/cloudera/sat/data/sampledata1.csv' using PigStorage(',') as (e_id:int,fname:chararray,mname:chararray,lname:chararray,age:chararray,gender:chararray,address:chararray,city:chararray,state:chararray,zip:int); dump A;

Output:

(,fname,mname,lname,age,gender,address,city,state,) (1,John,m,Smith,35,M,,Princeton,NJ,8536) (2,James,S,Clark,M,,Princeton,NJ,08536,)

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Guru

You have defined the first field as an int. Your data fields are ints, so you see them. But your header is a chararray and pig throws this casting error (string to int) by simply returning empty character.

If you use Piggybank, you can skip the header: http://stackoverflow.com/questions/29335656/hadoop-pig-removing-csv-header

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login