Support Questions
Find answers, ask questions, and share your expertise

How to reduce HDFS output file size?

Explorer

Sometimes I insert data to hive table using two ways:Hive and Hive on Tez.The HDFS output file size is twice when using hive on Tez. It take up more hdfs space.Is there any configurations to reduce the size?

1 ACCEPTED SOLUTION

Contributor

Have you looked into CompressedStorage features on Hive?

You should be able to use this (for Snappy at least):

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

View solution in original post

2 REPLIES 2

Contributor

Have you looked into CompressedStorage features on Hive?

You should be able to use this (for Snappy at least):

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

Mentor

@Jun Chen are you still having issues with this? Can you accept best answer or provide your own solution?

; ;