Support Questions
Find answers, ask questions, and share your expertise

How to check which mapper and reducer task is running on which node(Rack)

How to check which mapper and reducer task is running on which node(Rack)

Explorer

I have 3 racks- 1.Local rack 2.remote rack 3. DR .

I want to test whether the tasks such as mapR are processed on local server or remote server.

Main objective is to reduce the network latency(map-reduce processing time ) by getting maximum task processed on same rack i.e Local rack.

1 REPLY 1
Highlighted

Re: How to check which mapper and reducer task is running on which node(Rack)

Super Guru
@tauqeer khan

Concept of rack topology is a part of block placement policy, i.e. while writing data to HDFS.

More details - https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-common/RackAwareness.html

Data locality is a feature that’s implemented via synergy between the YARN platform and the individual applications. The YARN scheduler understands when the ApplicationMaster requests for containers specifying where and on what nodes’ data blocks are, so assigns containers on their respective nodes where the tasks will execute within the assigned containers. For example, in the MapReduce land, the Map Reduce ApplicationMaster may request that its HDFS blocks are on a specific set of machines, and that it needs these file-resources on these set of nodes, and accordingly, it will convert that information into a request for the containers to take advantage of the data close to it.

And YARN will allocate containers best suited for tasks as close to the data that the tasks need.

Source - http://hortonworks.com/blog/discover-hdp-2-1-apache-hdfs-yarn/

Hope this information helps!