Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Enabling HDFS compression

avatar
Cloudera Employee

The documentation (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch04.html) to enable HDFS is recommending to use deprecated properties, where do I find the correct documentation that guides on how to enable compression on HDFS.

1 ACCEPTED SOLUTION

avatar

Hi sprasad,

the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.

Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

To turn on HDFS compression using the new params, use the following configuration:

core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
    org.apache.hadoop.io.compress.SnappyCodec</value>
  <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
</property>

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property> 
 
<property> 
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 
 
<property> 
  <name>mapreduce.output.fileoutputformat.compress.type</name> 
  <value>BLOCK</value>
</property> 

(Optional) Job output compression, mapred-site.xml

<property> 
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value> 
</property> 

<property> 
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 

View solution in original post

3 REPLIES 3

avatar

Hi sprasad,

the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.

Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

To turn on HDFS compression using the new params, use the following configuration:

core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
    org.apache.hadoop.io.compress.SnappyCodec</value>
  <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
</property>

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property> 
 
<property> 
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 
 
<property> 
  <name>mapreduce.output.fileoutputformat.compress.type</name> 
  <value>BLOCK</value>
</property> 

(Optional) Job output compression, mapred-site.xml

<property> 
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value> 
</property> 

<property> 
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 

avatar
Super Collaborator

@sprasad thank you for the question, and @Jonas Straub, thanks for your response. I made a note to update our HDFS documentation.

avatar
Rising Star

Okay so once the above is done, I still see 80% of the space in use.... Shouldn't that initiate a block level compression of the data on hdfs? If not how is it done.... If it's possible.

Also I can't find that hadoop-examples.jar mentioned in their tutorial....