- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to use getmerge and exclude headers in csv files?
- Labels:
-
Apache Hive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I am using getmerge to combine multiple files like this:
hdfs dfs -getmerge /user/maria_dev/Folder3/* /Folder3/output1.csv
How can I exclude the header of each file? When I upload into hive table, it repeats each header row.
Alternatively, is there a query in Hive to exclude the actual header names? If I join 2 files and upload this into Hive, I have 2 lines of headers, and so on.
When I created my table, I included the following:
TBLPROPERTIES ("skip.header.line.count"="1");
However, this only skips the first line. How can I exclude the rest of the headers?
Thanks
Created ‎10-22-2019 09:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you do an insert overwrite select, the result will not have a header even if you set the header to true.
E.g
INSERT OVERWRITE DIRECTORY '${HDFSLocation}' row format delimited FIELDS TERMINATED BY '|' null defined as '' select col1,col2,col2 from data_base.table;
the final file or file created will not have headers.
