Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Change default Hive compression codec

avatar
New Contributor

Hi Cloudera Community , 

 

How i can change  the compression codec of hive at runtime. I'm reading some table on avro format compressed with snappy and i'm triying to write a similiar table compressed on snappy but the result is compressed on "deflate", after try with multiple options the resulting files were compressed with the same codec. 

 

Can you help me to identify my issue on the following sentences, or what can i do to define the compression codec of hive at runtime.

 


"set hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.dynamic.partition.mode=nonstrict;

 

 

CREATE external table IF NOT EXISTS tableX partitioned by (year Int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='hdfs:///AAA/BBB/CCC/tableX.avsc');

 

 

alter table tableX add if not exists partition (year = 2016)
location 'hdfs://nameservice/AAA/BBB/CCC/2016';

insert overwrite table tableX partition (year = 2016) SELECT
id, name, email
FROM tablaY WHERE year = 2016;"

 

Best Regards, 

 

Esteban

1 ACCEPTED SOLUTION

avatar
Mentor
Quoted from documentation about using Avro files at https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_avro_usage.html#topic_26_2

"""
Hive
(…)
To enable Snappy compression on output [avro] files, run the following before writing to the table:

SET hive.exec.compress.output=true;
SET avro.output.codec=snappy;
"""

Please try this out. You're missing only the second property mentioned here, which appears specific to Avro serialization in Hive.

Default compression of Avro is deflate, so that explains the behaviour you observe without it.

View solution in original post

1 REPLY 1

avatar
Mentor
Quoted from documentation about using Avro files at https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_avro_usage.html#topic_26_2

"""
Hive
(…)
To enable Snappy compression on output [avro] files, run the following before writing to the table:

SET hive.exec.compress.output=true;
SET avro.output.codec=snappy;
"""

Please try this out. You're missing only the second property mentioned here, which appears specific to Avro serialization in Hive.

Default compression of Avro is deflate, so that explains the behaviour you observe without it.