Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

compression for an example mapreduce job

Highlighted

compression for an example mapreduce job

Contributor

Dear community,

if I want to run the teragen program with output compression, is this the correct command:

sudo -u hdfs hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.13.0.jar teragen

-D mapreduce.map.output.compress=true

-D mapreduce.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec

-D mapreduce.output.fileoutputformat.compress=true 1000 /user/dev/teragen

 

Is the following correct? First option sets intermediate compression, 2nd option specifies it needs to be zipped compression, third option would ensure also the output is zipped. I have seen multiple commands some of them deprecated. I notice that my output of the terragen is still not zipped so something is still not correct

 

Thanks!