Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to improve distcp performance


How to improve distcp performance

I want to run distcp job copying huge data from source cluster to destination cluster , how can i increase the performance or speed of the distcp job ?


Re: How to improve distcp performance

New Contributor


You can find here all the options for distcp :

To improve that, i used -strategy dynamic and increased the number of the mappers (-m ) also the bandwith per mapper (-bandwith) ans the size of your containers of course if you want customize it.

so you finally have :

hadoop distcp -prb -bandwidth 50 -m 16 -update -delete -strategy dynamic hdfs://source/path/.snapshot/20181030-170124.063 swebhdfs://target/path 
Don't have an account?
Coming from Hortonworks? Activate your account here