Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to reduce HDFS output file size?

avatar
Explorer

Sometimes I insert data to hive table using two ways:Hive and Hive on Tez.The HDFS output file size is twice when using hive on Tez. It take up more hdfs space.Is there any configurations to reduce the size?

1 ACCEPTED SOLUTION

avatar
Rising Star

Have you looked into CompressedStorage features on Hive?

You should be able to use this (for Snappy at least):

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

View solution in original post

2 REPLIES 2

avatar
Rising Star

Have you looked into CompressedStorage features on Hive?

You should be able to use this (for Snappy at least):

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

avatar
Master Mentor

@Jun Chen are you still having issues with this? Can you accept best answer or provide your own solution?