Support Questions

Find answers, ask questions, and share your expertise

Enabling HDFS compression

avatar
Cloudera Employee

The documentation (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch04.html) to enable HDFS is recommending to use deprecated properties, where do I find the correct documentation that guides on how to enable compression on HDFS.

1 ACCEPTED SOLUTION

avatar

Hi sprasad,

the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.

Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

To turn on HDFS compression using the new params, use the following configuration:

core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
    org.apache.hadoop.io.compress.SnappyCodec</value>
  <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
</property>

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property> 
 
<property> 
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 
 
<property> 
  <name>mapreduce.output.fileoutputformat.compress.type</name> 
  <value>BLOCK</value>
</property> 

(Optional) Job output compression, mapred-site.xml

<property> 
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value> 
</property> 

<property> 
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 

View solution in original post

3 REPLIES 3

avatar

Hi sprasad,

the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.

Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

To turn on HDFS compression using the new params, use the following configuration:

core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
    org.apache.hadoop.io.compress.SnappyCodec</value>
  <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
</property>

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property> 
 
<property> 
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 
 
<property> 
  <name>mapreduce.output.fileoutputformat.compress.type</name> 
  <value>BLOCK</value>
</property> 

(Optional) Job output compression, mapred-site.xml

<property> 
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value> 
</property> 

<property> 
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 

avatar
Super Collaborator

@sprasad thank you for the question, and @Jonas Straub, thanks for your response. I made a note to update our HDFS documentation.

avatar
Rising Star

Okay so once the above is done, I still see 80% of the space in use.... Shouldn't that initiate a block level compression of the data on hdfs? If not how is it done.... If it's possible.

Also I can't find that hadoop-examples.jar mentioned in their tutorial....