Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Parquet table snappy compressed by default

avatar
Contributor

Hi,

 

1) If we create a table (both hive and impala)and just specify stored as parquet . Will that be snappy compressed by default in CDH?

 

2) If not how do i identify a parquet table with snappy compression and parquet table without snappy compression?.

 

Also how to specify snappy compression for table level  whiel creating and also at global level, even if nobody specified at table level (all table stored as parquet should be snappy compressed).

 

Please help

 

11 REPLIES 11

avatar
Champion

If the tables are created STORED AS PARQUET in Hive will they be using Snappy codec or not ?

According to cloudera most of the CDH component that usess parquet file not compressed by default. 

 

for ORC format 

CREATE TABLE testingsnappy_orc
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM sourcetable;

for Parquet format 

 

same but add the

 

 
TBLPROPERTIES ( "orc.compress"="SNAPPY" );
 

avatar
Expert Contributor

By default, in Hive, Parquet files are not written with compression enabled.

 

https://issues.apache.org/jira/browse/HIVE-11912

 

However, writing files with Impala into a Parquet table will create files with internal Snappy compression (by default).