Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to use getmerge and exclude headers in csv files?

avatar
Explorer

Hi

I am using getmerge to combine multiple files like this:

hdfs dfs -getmerge /user/maria_dev/Folder3/* /Folder3/output1.csv

How can I exclude the header of each file? When I upload into hive table, it repeats each header row.

Alternatively, is there a query in Hive to exclude the actual header names? If I join 2 files and upload this into Hive, I have 2 lines of headers, and so on.

When I created my table, I included the following:

TBLPROPERTIES ("skip.header.line.count"="1");

However, this only skips the first line. How can I exclude the rest of the headers?

Thanks

1 REPLY 1

avatar
New Contributor

If you do an insert overwrite select, the result will not have a header even if you set the header to true.

E.g 

INSERT OVERWRITE DIRECTORY '${HDFSLocation}' row format delimited FIELDS TERMINATED BY '|' null defined as '' select col1,col2,col2 from data_base.table;

 

the final file or file created will not have headers.