My system have 2 nodes of cluster. You can see image below.
When I build cubes on Kylin. At the same time I have several jobs running on YARN. But when I check each job ID on Yarn ResourceManager UI , I only see all job running on first node(HDP02). Jobs only running on second node(HDP03) when first node use all RAM available (4GB).
I want to it balance resource with 2 nodes. I mean when jobs running, it will run on both two node to share resources, at the same time some jobs running on first node(HDP02) and some job running on second node(HDP03)
So have any configuration to make this balance between two node ?
I can only guess, but possible the data which is used by job is located only on this datanode (not yet replicated or you have replication factor is 1). So, to reduce the network operations and use 'short-circuit', YARN utilizes the resources of this node first.
Unfortunately not. The resource allocation at this point of time does not consider the utilization of the individual NM's on the cluster and does not distributed the load evenly. These are the points discussed under the umbrella jira below: