- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Snappy Compression on Avro backed Hive table - Data not get compressed after loading in hive table
- Labels:
-
Apache Hive
Created ‎02-07-2017 08:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have created a hive avro based table with snappy compression. The size of avro file is 2628MB. The data in the hive avro based table without snappy compression is 2296MB. I have created one more avro hive table with snappy compression and loaded the same data. But there is no big change in the compression size. Also if I describe the table properties it shows that the compression as 'No'.
Please find below the table property.
Table Parameters: COLUMN_STATS_ACCURATE True avro.compress SNAPPY transient_lastDdlTime 1486455066 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1
Created ‎02-07-2017 04:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please let me know how did you load the data into the compressed table, i.e. command in hive? After the loading the data into the table, can you retrieve the data from that table?
Created ‎02-07-2017 08:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have text file with 2.6Gb. I have loaded it into hive table with text as storage type. From the text hive table i have loaded into avro based hive table by insert into table avro_hive table which is a snappy compression table. Please feel free if you need more details
Created ‎02-08-2017 06:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share your table creation script? I am not sure how you specify snappy compress in your script.
Created ‎02-08-2017 07:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you specify "avro.compress=snappy" as TBLPROPERTIES, it will not work. You can try to set it in the command line: set hive.exec.compress.output=true; set avro.output.codec=snappy; and see whether it indeed compressed.
Created ‎02-09-2017 02:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Frank. I have tried both ways.. But the compression ratio is still the same which has provided in the question.
Created ‎06-07-2017 02:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you find answer for this? I am facing the same issue. When I applied compression on external table with text format I could see the change in compression ratio, but when I applied the same on AVRO by setting the following attributes in hive-site.xml and creating table with "avro.compress=snappy" as TBLPROPERTIES, compression ratio is same. I am not sure if compression is applied on this table. Is there any way to validate if it is compressed or not?
"hive.exec.compress.output" : "true" "hive.exec.compress.intermediate" : "true" "avro.output.codec": "snappy"
