Support Questions

Find answers, ask questions, and share your expertise

How to improve distcp performance

avatar

I want to run distcp job copying huge data from source cluster to destination cluster , how can i increase the performance or speed of the distcp job ?

1 REPLY 1

avatar
New Contributor

Hi,

You can find here all the options for distcp : https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html

To improve that, i used -strategy dynamic and increased the number of the mappers (-m ) also the bandwith per mapper (-bandwith) ans the size of your containers of course if you want customize it.

so you finally have :

hadoop distcp -Dmapreduce.map.memory.mb=4096-Dyarn.app.mapreduce.am.resource.mb=4096 -Dmapred.job.queue.name=DISTCP_exec -prb -bandwidth 50 -m 16 -update -delete -strategy dynamic hdfs://source/path/.snapshot/20181030-170124.063 swebhdfs://target/path