Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Resource Utilization for Distcp

avatar
Contributor

When doing a hadoop distcp command from source to target, Is it possible to check the resource Utilization in both the source and target cluster.

1 ACCEPTED SOLUTION

avatar

@Al John Mangahas

Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization.

Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.

View solution in original post

1 REPLY 1

avatar

@Al John Mangahas

Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization.

Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.