Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

​Yarn Queue Capacity Scheduling

avatar
Rising Star

Has anyone come across the following scenario:

I launch 5 YARN jobs (requiring variable resources) in the same queue in this order:

job1, job2, job3, job4, job5

I’ve configured a Capacity Scheduler with FIFO ordering.

Observed behavior:

job 1 runs

job2 - job5 in waiting state

once job1 completes, job2-job5 runs in random order (job4, job2, job3, job5)

Is this expected?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Nasheb Ismaily

Yes. It is expected for FIFO policy. If you set a FIFO policy, then jobs are executed in the order you submitted them. You have the option to use FAIR policy. In that case, all jobs can be executed sharing fairly available resources and they don't have to wait one after the other. They will still start in the order you submitted, but based on what they do, they may finish in a different order. That assumes your cluster has enough resources and by design you wanted to go that way.

I did not include references to various documents because they were already provided and are widely available.

View solution in original post

6 REPLIES 6

avatar

@Nasheb Ismaily: Yes, If job1 is using all the container then job2-job5 has to have wait, Else job2 will pick remaining container and start executing it.

avatar
Rising Star

Thanks Sridhar, but what if job3 runs after job1, and then job4 runs, and then job2 runs?

avatar

in FIFO, it will execute in the order you have submitted the job. Refer: http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/

avatar
Super Collaborator

@Nasheb Ismaily, according to what I read, applications are in fifo order, according to the time of submission. If you submit them back to back very quickly, is it possible the timestamps are identical and arrived at the "same time"?

avatar
Super Collaborator

@Nasheb Ismaily, Double check your configuration. I know you already know this, but for the sake of a complete answer, here's how to configure FIFO.

The capacity scheduler queues can be configured for fifo or fair based via Ambari's Yarn Queue Manager (top right button). The default is fifo.

Via Ambari - Yarn Capacity Scheduler Queue configuration:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/section_creat...

Manually:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_yarn_resource_mgt/content/flexible_sched...

Also, the Yarn Fair Scheduler can be configured for FIFO:

https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

“schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf” or any class that extends”

avatar
Super Guru

@Nasheb Ismaily

Yes. It is expected for FIFO policy. If you set a FIFO policy, then jobs are executed in the order you submitted them. You have the option to use FAIR policy. In that case, all jobs can be executed sharing fairly available resources and they don't have to wait one after the other. They will still start in the order you submitted, but based on what they do, they may finish in a different order. That assumes your cluster has enough resources and by design you wanted to go that way.

I did not include references to various documents because they were already provided and are widely available.