Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to reduce HDFS output file size?

avatar
New Member

Sometimes I insert data to hive table using two ways:Hive and Hive on Tez.The HDFS output file size is twice when using hive on Tez. It take up more hdfs space.Is there any configurations to reduce the size?

1 ACCEPTED SOLUTION

avatar
Rising Star

Have you looked into CompressedStorage features on Hive?

You should be able to use this (for Snappy at least):

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

View solution in original post

2 REPLIES 2

avatar
Rising Star

Have you looked into CompressedStorage features on Hive?

You should be able to use this (for Snappy at least):

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

avatar
Master Mentor

@Jun Chen are you still having issues with this? Can you accept best answer or provide your own solution?