Reply
Highlighted
Contributor
Posts: 27
Registered: ‎10-18-2017

compression for an example mapreduce job

Dear community,

if I want to run the teragen program with output compression, is this the correct command:

sudo -u hdfs hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.13.0.jar teragen

-D mapreduce.map.output.compress=true

-D mapreduce.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec

-D mapreduce.output.fileoutputformat.compress=true 1000 /user/dev/teragen

 

Is the following correct? First option sets intermediate compression, 2nd option specifies it needs to be zipped compression, third option would ensure also the output is zipped. I have seen multiple commands some of them deprecated. I notice that my output of the terragen is still not zipped so something is still not correct

 

Thanks!

Announcements