Created 01-13-2017 11:26 AM
I was creating one table in hive using beeline in which i need to compress my data using PARQUET file format.
so i try to use set parquet.compression=SNAPPY;
But while executing this command i am getting one error as :
Error: Error while processing statement: Cannot modify parquet.compression at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)
I checked and this property is not present in whitelist of params and we dont have permissions to edit the whitelist.
so i got one resolution as instead of using set parquet.compression=SNAPPY; at runtime I used the table properties TBLPROPERTIES ('PARQUET.COMPRESS'='SNAPPY') and then it works the table is successfully created.
But when i loaded the data to table and by using describe table i compare the data with my other table in which i did not used the compression, the size of data is same.
so that means by using 'PARQUET.COMPRESS'='SNAPPY' compression is not happening.
Is there any other property which we need to set to get the compression done.
For Avro i have seen the below two properties to be set to do the compression
hive> set hive.exec.compress.output=true;
hive> set avro.output.codec=snappy;
Likewise do i need to set some other property for parquet file?
Created 01-17-2017 06:19 AM
It working now with 'PARQUET.COMPRESSION'='SNAPPY'
Created 01-13-2017 09:38 PM
How much data have you inserted to compare between the two tables ? Can you try it out with a substantially bigger data set ? Snappy is not very aggressive on reducing the size but rather on the compress/decompress operation.
Created 01-17-2017 06:19 AM
It working now with 'PARQUET.COMPRESSION'='SNAPPY'