Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to improve distcp performance

Highlighted

How to improve distcp performance

I want to run distcp job copying huge data from source cluster to destination cluster , how can i increase the performance or speed of the distcp job ?

1 REPLY 1
Highlighted

Re: How to improve distcp performance

New Contributor

Hi,

You can find here all the options for distcp : https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html

To improve that, i used -strategy dynamic and increased the number of the mappers (-m ) also the bandwith per mapper (-bandwith) ans the size of your containers of course if you want customize it.

so you finally have :

hadoop distcp -Dmapreduce.map.memory.mb=4096-Dyarn.app.mapreduce.am.resource.mb=4096 -Dmapred.job.queue.name=DISTCP_exec -prb -bandwidth 50 -m 16 -update -delete -strategy dynamic hdfs://source/path/.snapshot/20181030-170124.063 swebhdfs://target/path 
Don't have an account?
Coming from Hortonworks? Activate your account here