Support Questions

Find answers, ask questions, and share your expertise

How to use getmerge and exclude headers in csv files?

avatar
Explorer

Hi

I am using getmerge to combine multiple files like this:

hdfs dfs -getmerge /user/maria_dev/Folder3/* /Folder3/output1.csv

How can I exclude the header of each file? When I upload into hive table, it repeats each header row.

Alternatively, is there a query in Hive to exclude the actual header names? If I join 2 files and upload this into Hive, I have 2 lines of headers, and so on.

When I created my table, I included the following:

TBLPROPERTIES ("skip.header.line.count"="1");

However, this only skips the first line. How can I exclude the rest of the headers?

Thanks

1 REPLY 1

avatar
New Contributor

If you do an insert overwrite select, the result will not have a header even if you set the header to true.

E.g 

INSERT OVERWRITE DIRECTORY '${HDFSLocation}' row format delimited FIELDS TERMINATED BY '|' null defined as '' select col1,col2,col2 from data_base.table;

 

the final file or file created will not have headers.