Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to improve distcp performance
Labels:
- Labels:
-
Apache Hadoop
Guru
Created ‎10-23-2018 02:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to run distcp job copying huge data from source cluster to destination cluster , how can i increase the performance or speed of the distcp job ?
1 REPLY 1
New Contributor
Created ‎10-30-2018 05:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can find here all the options for distcp : https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
To improve that, i used -strategy dynamic and increased the number of the mappers (-m ) also the bandwith per mapper (-bandwith) ans the size of your containers of course if you want customize it.
so you finally have :
hadoop distcp -Dmapreduce.map.memory.mb=4096-Dyarn.app.mapreduce.am.resource.mb=4096 -Dmapred.job.queue.name=DISTCP_exec -prb -bandwidth 50 -m 16 -update -delete -strategy dynamic hdfs://source/path/.snapshot/20181030-170124.063 swebhdfs://target/path
