- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to reduce HDFS output file size?
- Labels:
-
Apache Tez
Created 12-11-2015 11:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sometimes I insert data to hive table using two ways:Hive and Hive on Tez.The HDFS output file size is twice when using hive on Tez. It take up more hdfs space.Is there any configurations to reduce the size?
Created 12-11-2015 02:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked into CompressedStorage features on Hive?
You should be able to use this (for Snappy at least):
SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK;
Created 12-11-2015 02:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked into CompressedStorage features on Hive?
You should be able to use this (for Snappy at least):
SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK;
Created 02-03-2016 03:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jun Chen are you still having issues with this? Can you accept best answer or provide your own solution?
