Created 12-29-2016 04:12 AM
hi i am planning to took HDPCD certificate exam this week. on practice exam in amazon webservices flight_delays1.csv file contains data with header. In exam i need to remove header manually ??
Created 01-10-2017 07:13 AM
In the exam you may or may not be required to remove the header.
It is better to know how to do it and feel more comfortable.
To remove header in Hive use tblproperties:
Create table test( name string, email string ) tblproperties("skip.header.line.count"="1"); //Now load the data into the table
To remove header in Pig:
A=load 'data.csv' using PigStorage(','); B=FILTER A BY $0>1;
Created 01-10-2017 07:13 AM
In the exam you may or may not be required to remove the header.
It is better to know how to do it and feel more comfortable.
To remove header in Hive use tblproperties:
Create table test( name string, email string ) tblproperties("skip.header.line.count"="1"); //Now load the data into the table
To remove header in Pig:
A=load 'data.csv' using PigStorage(','); B=FILTER A BY $0>1;
Created 01-10-2017 04:37 PM
I did the same way, load data using PIG into a bag, and FILTER the TOP row.
Good Luck
Created 03-16-2017 03:13 PM
Pls consider accepting the answer if this has helped you at all.
Thank you.