Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

improve performance of distcp operation / activity

Highlighted

improve performance of distcp operation / activity

New Contributor

We are daily copying 100+ tables/ data between production cluster and DR cluster. these tables are growing 0.5-1% daily.

The tables that used to take 5 mins few months back are now taking 15 minutes to copy.

We understand volume growth can cause performance issues but not to this drastic extend. Collectively, this is causing lot of delay.

Copying entire database is now taking 3-4 more hours

Can you please suggest how can we improve performance of the copy operation

2 REPLIES 2

Re: improve performance of distcp operation / activity

Super Mentor

@akash sharma

There are some options available to improve the performance of DistCP like:

1. Controlling the Number of Mappers and Their Bandwidth
2. Accelerating File Listing
3. Working with Local Stores

Please check this if it helps: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.1/bk_cloud-data-access/content/distcp-s3-perf...

Re: improve performance of distcp operation / activity

New Contributor

Thanks for your reply, but we are not copying to S3 here. We are copying to another cluster.

The reference article discusses ways to speed up copy to S3.

do you know how we can speed up copying to another cluster and not S3

Don't have an account?
Coming from Hortonworks? Activate your account here