Created 12-04-2015 09:15 AM
The documentation (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch04.html) to enable HDFS is recommending to use deprecated properties, where do I find the correct documentation that guides on how to enable compression on HDFS.
Created 12-04-2015 09:30 AM
Hi sprasad,
the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.
Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
To turn on HDFS compression using the new params, use the following configuration:
core-site.xml
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
mapred-site.xml
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property>
(Optional) Job output compression, mapred-site.xml
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Created 12-04-2015 09:30 AM
Hi sprasad,
the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.
Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
To turn on HDFS compression using the new params, use the following configuration:
core-site.xml
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
mapred-site.xml
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property>
(Optional) Job output compression, mapred-site.xml
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Created 12-11-2015 06:51 PM
@sprasad thank you for the question, and @Jonas Straub, thanks for your response. I made a note to update our HDFS documentation.
Created 09-27-2016 06:37 PM
Okay so once the above is done, I still see 80% of the space in use.... Shouldn't that initiate a block level compression of the data on hdfs? If not how is it done.... If it's possible.
Also I can't find that hadoop-examples.jar mentioned in their tutorial....