Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Compression is not happening in hive using parquet file format.

avatar
Rising Star

I was creating one table in hive using beeline in which i need to compress my data using PARQUET file format.

so i try to use set parquet.compression=SNAPPY;

But while executing this command i am getting one error as :

Error: Error while processing statement: Cannot modify parquet.compression at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)

I checked and this property is not present in whitelist of params and we dont have permissions to edit the whitelist.

so i got one resolution as instead of using set parquet.compression=SNAPPY; at runtime I used the table properties TBLPROPERTIES ('PARQUET.COMPRESS'='SNAPPY') and then it works the table is successfully created.

But when i loaded the data to table and by using describe table i compare the data with my other table in which i did not used the compression, the size of data is same.

so that means by using 'PARQUET.COMPRESS'='SNAPPY' compression is not happening.

Is there any other property which we need to set to get the compression done.

For Avro i have seen the below two properties to be set to do the compression

hive> set hive.exec.compress.output=true;

hive> set avro.output.codec=snappy;

Likewise do i need to set some other property for parquet file?

1 ACCEPTED SOLUTION

avatar
Rising Star

It working now with 'PARQUET.COMPRESSION'='SNAPPY'

View solution in original post

2 REPLIES 2

avatar
Guru

@Yukti Agrawal

How much data have you inserted to compare between the two tables ? Can you try it out with a substantially bigger data set ? Snappy is not very aggressive on reducing the size but rather on the compress/decompress operation.

avatar
Rising Star

It working now with 'PARQUET.COMPRESSION'='SNAPPY'