- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Enabling HDFS compression
- Labels:
-
Apache Hadoop
Created ‎12-04-2015 09:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The documentation (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch04.html) to enable HDFS is recommending to use deprecated properties, where do I find the correct documentation that guides on how to enable compression on HDFS.
Created ‎12-04-2015 09:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sprasad,
the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.
Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
To turn on HDFS compression using the new params, use the following configuration:
core-site.xml
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
mapred-site.xml
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property>
(Optional) Job output compression, mapred-site.xml
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Created ‎12-04-2015 09:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sprasad,
the documentation should be fine in regards to enabling HDFS compression, but I agree, the config params (or at least the names) are deprecated. The old config params are still supported and valid, however you should switch to the new names.
Here is a list of deprecated values and their new names: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
To turn on HDFS compression using the new params, use the following configuration:
core-site.xml
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
mapred-site.xml
<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property>
(Optional) Job output compression, mapred-site.xml
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Created ‎12-11-2015 06:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@sprasad thank you for the question, and @Jonas Straub, thanks for your response. I made a note to update our HDFS documentation.
Created ‎09-27-2016 06:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay so once the above is done, I still see 80% of the space in use.... Shouldn't that initiate a block level compression of the data on hdfs? If not how is it done.... If it's possible.
Also I can't find that hadoop-examples.jar mentioned in their tutorial....
