Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to tune yarn.scheduler.capacity.maximum-am-resource-percent in hadoop clusters

avatar

we are running spark streaming app that consumes data from kafka topics


we want to know how to tune the parameter - `yarn.scheduler.capacity.maximum-am-resource-percent`

 

according to documentation:

 

yarn.scheduler.capacity.maximum-am-resource-percent: Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running,

 

on some document we even see that recomneded to utilise it to `90 percent` for best results,

but the default is `10%`

 

so my question is

do we need to tune this parameter according to cluster size?

 

or what is the best practice in order to get good results?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar

The tuning of this property totally depends on your use case.

 

yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent: Queue level AMshare

For instance lets say your cluster is primarily used for oozie. For each oozie action [except sshaction] you will have a oozie launcher application(map-only job which will start the jobs) and an external application which actually does the job. In this case you will have a requirement to run lots of application and inturn lots of application master. In such cases if you want to achieve more parallelism you will create a dedicated queue for launcher application[oozie.launcher.mapred.job.queue.name can be used to direct all launcher application to this dedicated queue] and another another queue for the external application. You can then set 0.5 to launcher queue which has a single AM and single Mapper so equal distribution is rational setting.

 

At cluster level yarn.scheduler.capacity.maximum-am-resource-percent - lets say you have the capacity to run 1000 containers and each of your application on an average runs 10 mapper. Then setting this value to 10% would allow you to run, 100 application in parallel (100 application master and 900 mappers). If you set this to 20% then you get a chance to run 200 application in parallel (200 application master and 800 mapper container) - each application will run short of 2 containers and will wait for other application to finish and the average throughput of your application will be little longer. 

View solution in original post

3 REPLIES 3

avatar

The tuning of this property totally depends on your use case.

 

yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent: Queue level AMshare

For instance lets say your cluster is primarily used for oozie. For each oozie action [except sshaction] you will have a oozie launcher application(map-only job which will start the jobs) and an external application which actually does the job. In this case you will have a requirement to run lots of application and inturn lots of application master. In such cases if you want to achieve more parallelism you will create a dedicated queue for launcher application[oozie.launcher.mapred.job.queue.name can be used to direct all launcher application to this dedicated queue] and another another queue for the external application. You can then set 0.5 to launcher queue which has a single AM and single Mapper so equal distribution is rational setting.

 

At cluster level yarn.scheduler.capacity.maximum-am-resource-percent - lets say you have the capacity to run 1000 containers and each of your application on an average runs 10 mapper. Then setting this value to 10% would allow you to run, 100 application in parallel (100 application master and 900 mappers). If you set this to 20% then you get a chance to run 200 application in parallel (200 application master and 800 mapper container) - each application will run short of 2 containers and will wait for other application to finish and the average throughput of your application will be little longer. 

avatar

do you know , if some where we can found calculator for this parameter ?

 

I mean lets say we have diff clusters and diff application running , and we  want to  tune this value according to our needs , then maybe some script that allow to set this value to the right tune?

Michael-Bronson

avatar

Sorry, I've not come across any scripts yet. For observability the cluster utilisation report is something that you can review to understand how weightage influenced the load. More details are in this link https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/admin_cluster_util_report.html#conc...