- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Resource Utilization for Distcp
- Labels:
-
Apache Hadoop
Created ‎07-31-2017 02:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When doing a hadoop distcp command from source to target, Is it possible to check the resource Utilization in both the source and target cluster.
Created ‎07-31-2017 06:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization.
Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.
Created ‎07-31-2017 06:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Distcp spins off MapReduce jobs on the cluster it is running on/from. You can use the Yarn UI on that cluster to monitor the job progress and utilization.
Having said that, if you are copying from a Prod cluster to a DR cluster, and are worried about resource usage, then you can actually run the Distcp job on the DR cluster and have it "pull" the data from Prod. That way, the impact in terms of resources on Prod is minimal.
