Hadoop is designed to ensure that compute (Node Managers) runs as close to data (Data Nodes) as possible. Usually containers for jobs are allocated on the same nodes where the data is present. Hence in a typical Hadoop cluster, both Data Nodes and Node Manager run on the same machine.
Node Manager is the RM slave process while the Data Nodes is the Namenode slave process which responsible for coordinating HDFS functions
Resource Manager: Runs on a master daemon and manages the resource allocation in the cluster. Node Manager: They run on the slave daemons and are responsible for the execution of a task on every single Data Node
Node Managers manage the containers requested by jobs Data Nodes manage the data
The NodeManager (NM) is YARN’s per-node agent and takes care of the individual compute nodes in a Hadoop cluster. This includes keeping up-to-date with the ResourceManager (RM), overseeing containers’ life-cycle management; monitoring resource usage (memory, CPU) of individual containers, tracking node-health, log’s management, and auxiliary services that may be exploited by different YARN applications. NodeManager communicates directly with the ResourceManager.
Resource manager and Namenode both as master components [processes] that can run in single or HA setup should run on separate identical usually high spec servers [nodes] as compared to the data nodes. Zookeeper is another important component
ResourceManager and NodeManager combine together to form a data-computation framework.
ResourceManager acts as the scheduler and allocates resources amongst all the applications in the system.
NodeManager takes navigation from the ResourceManager and it runs on each node in the cluster. Resources available on a single node is managed by NodeManager.
ApplicationMaster, a framework-specific library is responsible for running specific YARN job and for negotiating resources from the ResourceManager, and working with NodeManager to execute and monitor containers.