Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Change default Hive compression codec

SOLVED Go to solution

Change default Hive compression codec

New Contributor

Hi Cloudera Community , 

 

How i can change  the compression codec of hive at runtime. I'm reading some table on avro format compressed with snappy and i'm triying to write a similiar table compressed on snappy but the result is compressed on "deflate", after try with multiple options the resulting files were compressed with the same codec. 

 

Can you help me to identify my issue on the following sentences, or what can i do to define the compression codec of hive at runtime.

 


"set hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.dynamic.partition.mode=nonstrict;

 

 

CREATE external table IF NOT EXISTS tableX partitioned by (year Int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs:///AAA/BBB/CCC/tableX.avsc');

 

 

alter table tableX add if not exists partition (year = 2016)
location 'hdfs://nameservice/AAA/BBB/CCC/2016';

insert overwrite table tableX partition (year = 2016) SELECT
id, name, email
FROM tablaY WHERE year = 2016;"

 

Best Regards, 

 

Esteban

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Change default Hive compression codec

Master Guru
Quoted from documentation about using Avro files at https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_avro_usage.html#topic_26_2

"""
Hive
(…)
To enable Snappy compression on output [avro] files, run the following before writing to the table:

SET hive.exec.compress.output=true;
SET avro.output.codec=snappy;
"""

Please try this out. You're missing only the second property mentioned here, which appears specific to Avro serialization in Hive.

Default compression of Avro is deflate, so that explains the behaviour you observe without it.
1 REPLY 1
Highlighted

Re: Change default Hive compression codec

Master Guru
Quoted from documentation about using Avro files at https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_avro_usage.html#topic_26_2

"""
Hive
(…)
To enable Snappy compression on output [avro] files, run the following before writing to the table:

SET hive.exec.compress.output=true;
SET avro.output.codec=snappy;
"""

Please try this out. You're missing only the second property mentioned here, which appears specific to Avro serialization in Hive.

Default compression of Avro is deflate, so that explains the behaviour you observe without it.