Created 09-27-2016 09:19 PM
Created 09-30-2016 05:00 PM
Not sure of your exact question, but typically it is a good idea to compress the output of your map step in map-reduce jobs. This is because this data is written to disk and then sent within your cluster to the reducer (shuffle) and the overhead of compressing/decompressing is almost always minimal compared to the large gains from sending over the wire significantly lower data volumes from compressed data.
To set this for all of your jobs, use these configs in mapred-site.xml"
<property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
You can of course set the first value to false in mapred-site.xml and override it by setting it for each job (e.g. as a parameter in the command line or set at the top of a pig script).
See this link for details: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/ch04.html
Created 09-30-2016 05:00 PM
Not sure of your exact question, but typically it is a good idea to compress the output of your map step in map-reduce jobs. This is because this data is written to disk and then sent within your cluster to the reducer (shuffle) and the overhead of compressing/decompressing is almost always minimal compared to the large gains from sending over the wire significantly lower data volumes from compressed data.
To set this for all of your jobs, use these configs in mapred-site.xml"
<property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
You can of course set the first value to false in mapred-site.xml and override it by setting it for each job (e.g. as a parameter in the command line or set at the top of a pig script).
See this link for details: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/ch04.html