Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

On what basis Application Master decides that it needs more containers

avatar
Rising Star

1. On what basis Application Master decides that it needs more containers?

2. Will each mapper have separate containers?

3. Let's say one mapper launched in container and mapper completed 20% of work and if it requires more resources to complete remaining 80% of the task then how the resources will be allocated and who will allocate? If Distribution happens between containers then how it will happen?

1 ACCEPTED SOLUTION

avatar
Master Guru

1. On what basis Application Master decides that it needs more containers?

Depends completely on the application master. For example in Pig/Hive he computes the number of mappers based on input splits ( blocks ) so if you have a 1GB file with 10 blocks he will ask for 10 map containers. If you then specified 5 reducers the application master will then ask for 5 containers for the reducers. This is a different configuration for each type of yarn workload. One example of "dynamic" requests are in slider which is a framework you can "flex" up or down based on command line. But again in the end the user tells slider to request more.

There is no magic scaling inherent to yarn it depends on your application.

2. Will each mapper have separate containers?

In classic MapReduce One map = 1 container. ( Tez on the other hand has container reuse so a container it asked for for a "map" task will then be used for a reducer for example ). And finally we will soon have LLAP which can run multiple map/reduce task in the same container as threads similar to a Spark Executor. So all is possible.

3. Let's say one mapper launched in container and mapper completed 20% of work and if it requires more resources to complete remaining 80% of the task then how the resources will be allocated and who will allocate? If Distribution happens between containers then how it will happen?

Again depends. Mapreduce is stupid it knows in advance how much work there is ( number of blocks ) and then asks for the number of mappers/containers it needs and distributes the work between them. For the reducers hive/tez for example can compute the number of reducers it needs based on the output size of the mappers. But once the reducer stage is started it doesnt change that anymore. So your question is not really correct.

Summary: In the end you assume Yarn would automatically scale containers on need but that is not what happens. What really happens is that the different workloads in yarn predict how much containers they need based on file sizes/ map output etc. and then ask for the correct amount of containers for a stage. There is no scaling up/down in a single task normally. What is dynamic is yarn providing containers. So if an application master asks for 1000 tasks and there are only 200 slots and some occupied by other tasks. Yarn can provide them to the application master piece by piece. Some application masters like Mapreduce are fine with that. Other application masters like spark will not start processing until they got all the containers running they requested at the same time. Again it depends on the application master.

Now there is nothing prohibiting a Application Master to do this if he wanted to do that but its not what happens in reality for most of the workloads like Tez/MapReduce/Spark/ ... The only dynamic scaling I am aware of is in pig/hive between stages as in the application master predicts how many containers it needs for the reducer stage based on the size of the map output.

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

Hi @srinivasa rao,

Answer1: Application master negotiates with the Resource Manager for the resources not for the containers. Container can be assumed as a box with resources for running an application. Resources are negotiated with resource manager through resource manager protocol by Application Master based on the User-code.

Since it is essentially user-code, do not trust the ApplicationMaster(s) i.e. any ApplicationMaster is not a privileged service.

The YARN system (ResourceManager and NodeManager) has to protect itself from faulty or malicious ApplicationMaster(s) and resources granted to them at all costs.

Answer2: each job is performed within a Container. it could be multiple jobs or one job thats been done in a container based on the resources granted by the RM through AM.

Answer3: the internals of how the resources are allocated or scheduled is always taken care by Resource Manager. May be 20% or rest of the 80% always it the job of the Resource Manager to allocate the resources to the Application Master working along with the node manager on that particular Node. Its always the responsibility of Node Manager and Resource Manager to check the status of the resources allocated.

Hope that help.

For more information here is the article which explains in simple terms.

http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/

Thanks,

Sujitha

avatar
Rising Star

Hi Sujitha,

I am not satisfying with the answers which you provided

1. Without executing the user-code how Application master will come to know how much of resource its required?

2. On what basis Resource manager will take care of resource allocation

avatar
Expert Contributor

Hi,

Once it get a request from client to YARN will co-ordinate for execution.

During YARN job it has RM and Application Master, While creating application master it will ask for Node manager to provide the best available memory and cores as default mentioned in .xml file.

Once it get information from initial memory and cores , then container will be created and it depends on what type of scheduler your are using for YARN. It has 1:FIFO 2:Fair 3:Capacity Scheduler By default it uses the Fair scheduler for the JOB.

avatar
Super Collaborator

Hi @srinivasa rao,

glad that you are satisfied by the answer provided by Benjamin Leonhardi. Please let me know in case of any issues.

avatar
Master Guru

1. On what basis Application Master decides that it needs more containers?

Depends completely on the application master. For example in Pig/Hive he computes the number of mappers based on input splits ( blocks ) so if you have a 1GB file with 10 blocks he will ask for 10 map containers. If you then specified 5 reducers the application master will then ask for 5 containers for the reducers. This is a different configuration for each type of yarn workload. One example of "dynamic" requests are in slider which is a framework you can "flex" up or down based on command line. But again in the end the user tells slider to request more.

There is no magic scaling inherent to yarn it depends on your application.

2. Will each mapper have separate containers?

In classic MapReduce One map = 1 container. ( Tez on the other hand has container reuse so a container it asked for for a "map" task will then be used for a reducer for example ). And finally we will soon have LLAP which can run multiple map/reduce task in the same container as threads similar to a Spark Executor. So all is possible.

3. Let's say one mapper launched in container and mapper completed 20% of work and if it requires more resources to complete remaining 80% of the task then how the resources will be allocated and who will allocate? If Distribution happens between containers then how it will happen?

Again depends. Mapreduce is stupid it knows in advance how much work there is ( number of blocks ) and then asks for the number of mappers/containers it needs and distributes the work between them. For the reducers hive/tez for example can compute the number of reducers it needs based on the output size of the mappers. But once the reducer stage is started it doesnt change that anymore. So your question is not really correct.

Summary: In the end you assume Yarn would automatically scale containers on need but that is not what happens. What really happens is that the different workloads in yarn predict how much containers they need based on file sizes/ map output etc. and then ask for the correct amount of containers for a stage. There is no scaling up/down in a single task normally. What is dynamic is yarn providing containers. So if an application master asks for 1000 tasks and there are only 200 slots and some occupied by other tasks. Yarn can provide them to the application master piece by piece. Some application masters like Mapreduce are fine with that. Other application masters like spark will not start processing until they got all the containers running they requested at the same time. Again it depends on the application master.

Now there is nothing prohibiting a Application Master to do this if he wanted to do that but its not what happens in reality for most of the workloads like Tez/MapReduce/Spark/ ... The only dynamic scaling I am aware of is in pig/hive between stages as in the application master predicts how many containers it needs for the reducer stage based on the size of the map output.

avatar
Expert Contributor

Very neatly explained.!