Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Snappy Compression on Avro backed Hive table - Data not get compressed after loading in hive table

avatar

I have created a hive avro based table with snappy compression. The size of avro file is 2628MB. The data in the hive avro based table without snappy compression is 2296MB. I have created one more avro hive table with snappy compression and loaded the same data. But there is no big change in the compression size. Also if I describe the table properties it shows that the compression as 'No'.

Please find below the table property.

Table Parameters: COLUMN_STATS_ACCURATE True avro.compress SNAPPY transient_lastDdlTime 1486455066 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1

6 REPLIES 6

avatar
Expert Contributor

Could you please let me know how did you load the data into the compressed table, i.e. command in hive? After the loading the data into the table, can you retrieve the data from that table?

avatar

I have text file with 2.6Gb. I have loaded it into hive table with text as storage type. From the text hive table i have loaded into avro based hive table by insert into table avro_hive table which is a snappy compression table. Please feel free if you need more details

avatar
Expert Contributor

Can you share your table creation script? I am not sure how you specify snappy compress in your script.

avatar
Expert Contributor

If you specify "avro.compress=snappy" as TBLPROPERTIES, it will not work. You can try to set it in the command line: set hive.exec.compress.output=true; set avro.output.codec=snappy; and see whether it indeed compressed.

avatar

Thanks Frank. I have tried both ways.. But the compression ratio is still the same which has provided in the question.

avatar
New Contributor

Did you find answer for this? I am facing the same issue. When I applied compression on external table with text format I could see the change in compression ratio, but when I applied the same on AVRO by setting the following attributes in hive-site.xml and creating table with "avro.compress=snappy" as TBLPROPERTIES, compression ratio is same. I am not sure if compression is applied on this table. Is there any way to validate if it is compressed or not?

"hive.exec.compress.output" : "true"
"hive.exec.compress.intermediate" : "true"
"avro.output.codec": "snappy"