Created 10-05-2016 05:02 AM
Hi, I am trying to compress the output of distcp but it doesn't compress the output . pls help me to compress it . below is the command using
hadoop distcp -D mapreduce.output.fileoutputformat.compress=true -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec inputdir outputdir
Created 10-05-2016 07:59 PM
This feature is still not available in Hadoop by default. You can add a patch but distcp doesn't compress data. following JIRA will give you all the details including the patch you want to download.
https://issues.apache.org/jira/browse/HADOOP-8065
Following is the new JIRA
https://issues.apache.org/jira/browse/HADOOP-13114 --> use this one if you decide to apply the patch.
Created 10-05-2016 01:30 PM
@Arun Reddy Which version of hdp you are using ?
Created 10-05-2016 06:46 PM
Apache raw version Hadoop 2.7.1
Created 10-05-2016 07:59 PM
This feature is still not available in Hadoop by default. You can add a patch but distcp doesn't compress data. following JIRA will give you all the details including the patch you want to download.
https://issues.apache.org/jira/browse/HADOOP-8065
Following is the new JIRA
https://issues.apache.org/jira/browse/HADOOP-13114 --> use this one if you decide to apply the patch.
Created 10-06-2016 06:07 AM
Thank you @mqureshi . how about HDP 2.4 .does that patch included in HDP 2.4 and above ?
Created 10-06-2016 04:21 PM
Negative. If you check the Jira's, they are unresolved. We don't ship unresolved issues in our product. So, your only option right now is to download the patch and apply to your installation. That will affect support if you have that because you are applying a non hortonworks patch.
I would suggest that you simply distcp the file and then compress it. You are only saving a step. It's not saving you any time or giving better performance.
Created 10-07-2016 06:20 AM
Thanks for your time . I am bringing the dir to local and applying compression