- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Pig --> Header first column is not showing
- Labels:
-
Apache Pig
Created ‎05-14-2017 12:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I wrote a small pig script to load simple data set to relation and dumping data on to console. When I dump it, it's not showing the first column name (header). Is there anything wrong with my script?
Data Set:
e_id,fname,mname,lname,age,gender,address,city,state,zip 1,John,m,Smith,35,M,,Princeton,NJ,08536 2,James,S,Clark,M,,Princeton,NJ,08536
Script:
A= LOAD '/user/cloudera/sat/data/sampledata1.csv' using PigStorage(',') as (e_id:int,fname:chararray,mname:chararray,lname:chararray,age:chararray,gender:chararray,address:chararray,city:chararray,state:chararray,zip:int); dump A;
Output:
(,fname,mname,lname,age,gender,address,city,state,) (1,John,m,Smith,35,M,,Princeton,NJ,8536) (2,James,S,Clark,M,,Princeton,NJ,08536,)
Created ‎05-14-2017 01:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Satish S
In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.
Created ‎05-14-2017 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have defined the first field as an int. Your data fields are ints, so you see them. But your header is a chararray and pig throws this casting error (string to int) by simply returning empty character.
If you use Piggybank, you can skip the header: http://stackoverflow.com/questions/29335656/hadoop-pig-removing-csv-header
Created ‎05-14-2017 01:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Satish S
In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.
