Support Questions

Find answers, ask questions, and share your expertise

How Yarn distributes containers on nodemanagers while deploying spark applications?

avatar
Contributor

When I deploy Spark apps on YARN, I want to know how Yarn distributes containers on nodemanagers?

If I've deployed 5 apps with 2 executors and 1 driver each then how Yarn take care of distribution or is there any algorithm for distribution? In my case it is not fairly distributed on all node managers i.e. I've 6 node managers running and I want my executors goes on all node managers for smooth working. Can anyone please tell me how can I achieve this?

4 REPLIES 4

avatar
Contributor

@Rohit Khose YARN is resource negotiator for your cluster. Spark ( like other hadoop application) requests YARN for resources specified by the user and if available it will use them.

You can enable spark dynamic allocation so the spark application can scale up/down executors depending on the need.

https://spark.apache.org/docs/1.6.1/configuration.html#dynamic-allocation

avatar
Contributor

Thanks for the information, but I wanted to know is after deploying apps on YARN how this YARN allocates containers(executor) on node managers(workers) i.e. from which property or algorithm it is allocating container on node managers? So can you please tell me, from where we can change this property or algorithm of allocation of containers on node managers?

avatar
Explorer

Resource Manager allocates Application-Master for each Application/job. Application Master is responsible for lifetime of your Application/Job. Application-Master negotiates with Resource manager and allocates containers on nodemanagers.
I am looking for How can i allocate container on specific Datanode???

Please follow following link for Details,
https://community.hortonworks.com/questions/203537/container-allocation-by-application-master-in-had...

If you have found the solution, Please Share.

avatar
Contributor

https://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/

Above link has good overview of how YARN works and the algorithms used ( capacity and fair scheduler ) by Resource manager for scheduling.

Yarn capacity scheduler config tutorial is available at https://hortonworks.com/hadoop-tutorial/configuring-yarn-capacity-scheduler-ambari/

Does this help?