I have created a hive avro based table with snappy compression. The size of avro file is 2628MB. The data in the hive avro based table without snappy compression is 2296MB. I have created one more avro hive table with snappy compression and loaded the same data. But there is no big change in the compression size. Also if I describe the table properties it shows that the compression as 'No'.
Please find below the table property.
Table Parameters: COLUMN_STATS_ACCURATE True avro.compress SNAPPY transient_lastDdlTime 1486455066 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat Compressed: No Num Buckets: -1 Bucket Columns:  Sort Columns:  Storage Desc Params: serialization.format 1
Could you please let me know how did you load the data into the compressed table, i.e. command in hive? After the loading the data into the table, can you retrieve the data from that table?
I have text file with 2.6Gb. I have loaded it into hive table with text as storage type. From the text hive table i have loaded into avro based hive table by insert into table avro_hive table which is a snappy compression table. Please feel free if you need more details
Can you share your table creation script? I am not sure how you specify snappy compress in your script.
If you specify "avro.compress=snappy" as TBLPROPERTIES, it will not work. You can try to set it in the command line: set hive.exec.compress.output=true; set avro.output.codec=snappy; and see whether it indeed compressed.
Thanks Frank. I have tried both ways.. But the compression ratio is still the same which has provided in the question.
Did you find answer for this? I am facing the same issue. When I applied compression on external table with text format I could see the change in compression ratio, but when I applied the same on AVRO by setting the following attributes in hive-site.xml and creating table with "avro.compress=snappy" as TBLPROPERTIES, compression ratio is same. I am not sure if compression is applied on this table. Is there any way to validate if it is compressed or not?
"hive.exec.compress.output" : "true" "hive.exec.compress.intermediate" : "true" "avro.output.codec": "snappy"