Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Include header in Hive gzip output

Include header in Hive gzip output

Explorer
How do I make sure each .gz file is created with a header? I am setting these properties which give me multiple output files named 00000_0.gz, 00001_0.gz, 00002_0.gz, etc. But these have no header. What syntax do I need to force a header for each file? Properties now set: set mapred.output.compress=true; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
4 REPLIES 4

Re: Include header in Hive gzip output

Champion
set hive.cli.print.header=true;

if I understand correctly , you should try the setting the above property for column header 

 

this only works with select and not 

INSERT OVERWRITE local   

 

Re: Include header in Hive gzip output

Explorer

Thanks csguna, but that doesn't work when using INSERT OVERWRITE

 

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/target_dir/' ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' SELECT ...

 

Re: Include header in Hive gzip output

Champion

could please let me know if you are firing this in cli ?

can you share the error 

Highlighted

Re: Include header in Hive gzip output

Explorer

Via CLI ...

 

 

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/my_table/' ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' SELECT fields FROM data_lake.my_table WHERE partition_datetime LIKE '2016-09%';
Query ID = user_me_20170405194343_c5066b7b-6e17-415b-aa32-29fea1eae434
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/parquet-format-2.1.0-cdh5.10.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/parquet-pig-bundle-1.5.0-cdh5.10.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/parquet-hadoop-bundle-1.5.0-cdh5.10.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hive-exec-1.1.0-cdh5.10.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hive-jdbc-1.1.0-cdh5.10.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
Starting Job = job_1490821219264_2176, Tracking URL = http://ip-172-16-12-12.us-west-1.compute.internal:8088/proxy/application_c/
Kill Command = /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/bin/hadoop job  -kill job_1481221219662_2176
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-04-05 19:43:28,168 Stage-1 map = 0%,  reduce = 0%
2017-04-05 19:43:37,612 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.6 sec
MapReduce Total cumulative CPU time: 1 seconds 600 msec
Ended Job = job_1490821219264_2176
Copying data to local directory /tmp/my_table
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.6 sec   HDFS Read: 6317 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 600 msec
OK
Time taken: 25.453 seconds

But this produced no output

[user_me@machine ./my_table]>> ls -l
total 0
-rw-r--r-- 1 user_me xyz 0 Apr  5 19:43 000000_0