Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Enabling HDFS compression

Solved Go to solution

Enabling HDFS compression

Cloudera Employee

The documentation (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch04.html) to enable HDFS is recommending to use deprecated properties, where do I find the correct documentation that guides on how to enable compression on HDFS.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Enabling HDFS compression

Hi sprasad,

the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.

Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

To turn on HDFS compression using the new params, use the following configuration:

core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
    org.apache.hadoop.io.compress.SnappyCodec</value>
  <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
</property>

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property> 
 
<property> 
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 
 
<property> 
  <name>mapreduce.output.fileoutputformat.compress.type</name> 
  <value>BLOCK</value>
</property> 

(Optional) Job output compression, mapred-site.xml

<property> 
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value> 
</property> 

<property> 
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 

View solution in original post

3 REPLIES 3
Highlighted

Re: Enabling HDFS compression

Hi sprasad,

the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.

Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

To turn on HDFS compression using the new params, use the following configuration:

core-site.xml

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,
    org.apache.hadoop.io.compress.SnappyCodec</value>
  <description>A list of the compression codec classes that can be used
    for compression/decompression.</description>
</property>

mapred-site.xml

<property>
  <name>mapreduce.map.output.compress</name>
  <value>true</value>
</property> 
 
<property> 
  <name>mapreduce.map.output.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 
 
<property> 
  <name>mapreduce.output.fileoutputformat.compress.type</name> 
  <value>BLOCK</value>
</property> 

(Optional) Job output compression, mapred-site.xml

<property> 
  <name>mapreduce.output.fileoutputformat.compress</name>
  <value>true</value> 
</property> 

<property> 
  <name>mapreduce.output.fileoutputformat.compress.codec</name>
  <value>org.apache.hadoop.io.compress.GzipCodec</value> 
</property> 

View solution in original post

Highlighted

Re: Enabling HDFS compression

Expert Contributor

@sprasad thank you for the question, and @Jonas Straub, thanks for your response. I made a note to update our HDFS documentation.

Highlighted

Re: Enabling HDFS compression

Contributor

Okay so once the above is done, I still see 80% of the space in use.... Shouldn't that initiate a block level compression of the data on hdfs? If not how is it done.... If it's possible.

Also I can't find that hadoop-examples.jar mentioned in their tutorial....

Don't have an account?
Coming from Hortonworks? Activate your account here