We are daily copying 100+ tables/ data between production cluster and DR cluster. these tables are growing 0.5-1% daily.
The tables that used to take 5 mins few months back are now taking 15 minutes to copy.
We understand volume growth can cause performance issues but not to this drastic extend. Collectively, this is causing lot of delay.
Copying entire database is now taking 3-4 more hours
Can you please suggest how can we improve performance of the copy operation
There are some options available to improve the performance of DistCP like:
1. Controlling the Number of Mappers and Their Bandwidth
2. Accelerating File Listing
3. Working with Local Stores
Please check this if it helps: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/bk_cloud-data-access/content/distcp-s3-perf...
Thanks for your reply, but we are not copying to S3 here. We are copying to another cluster.
The reference article discusses ways to speed up copy to S3.
do you know how we can speed up copying to another cluster and not S3