- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to reduce HDFS output file size?
- Labels:
-
Apache Tez
Created ‎12-11-2015 11:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sometimes I insert data to hive table using two ways:Hive and Hive on Tez.The HDFS output file size is twice when using hive on Tez. It take up more hdfs space.Is there any configurations to reduce the size?
Created ‎12-11-2015 02:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked into CompressedStorage features on Hive?
You should be able to use this (for Snappy at least):
SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK;
Created ‎12-11-2015 02:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked into CompressedStorage features on Hive?
You should be able to use this (for Snappy at least):
SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK;
Created ‎02-03-2016 03:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jun Chen are you still having issues with this? Can you accept best answer or provide your own solution?
