Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

​Yarn Queue Capacity Scheduling

Solved Go to solution
Highlighted

​Yarn Queue Capacity Scheduling

Contributor

Has anyone come across the following scenario:

I launch 5 YARN jobs (requiring variable resources) in the same queue in this order:

job1, job2, job3, job4, job5

I’ve configured a Capacity Scheduler with FIFO ordering.

Observed behavior:

job 1 runs

job2 - job5 in waiting state

once job1 completes, job2-job5 runs in random order (job4, job2, job3, job5)

Is this expected?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: ​Yarn Queue Capacity Scheduling

@Nasheb Ismaily

Yes. It is expected for FIFO policy. If you set a FIFO policy, then jobs are executed in the order you submitted them. You have the option to use FAIR policy. In that case, all jobs can be executed sharing fairly available resources and they don't have to wait one after the other. They will still start in the order you submitted, but based on what they do, they may finish in a different order. That assumes your cluster has enough resources and by design you wanted to go that way.

I did not include references to various documents because they were already provided and are widely available.

View solution in original post

6 REPLIES 6
Highlighted

Re: ​Yarn Queue Capacity Scheduling

@Nasheb Ismaily: Yes, If job1 is using all the container then job2-job5 has to have wait, Else job2 will pick remaining container and start executing it.

Highlighted

Re: ​Yarn Queue Capacity Scheduling

Contributor

Thanks Sridhar, but what if job3 runs after job1, and then job4 runs, and then job2 runs?

Highlighted

Re: ​Yarn Queue Capacity Scheduling

in FIFO, it will execute in the order you have submitted the job. Refer: http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/

Highlighted

Re: ​Yarn Queue Capacity Scheduling

Super Collaborator

@Nasheb Ismaily, according to what I read, applications are in fifo order, according to the time of submission. If you submit them back to back very quickly, is it possible the timestamps are identical and arrived at the "same time"?

Highlighted

Re: ​Yarn Queue Capacity Scheduling

Super Collaborator

@Nasheb Ismaily, Double check your configuration. I know you already know this, but for the sake of a complete answer, here's how to configure FIFO.

The capacity scheduler queues can be configured for fifo or fair based via Ambari's Yarn Queue Manager (top right button). The default is fifo.

Via Ambari - Yarn Capacity Scheduler Queue configuration:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/section_creat...

Manually:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_yarn_resource_mgt/content/flexible_sched...

Also, the Yarn Fair Scheduler can be configured for FIFO:

https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

“schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”/“fair”/“drf” or any class that extends”

Highlighted

Re: ​Yarn Queue Capacity Scheduling

@Nasheb Ismaily

Yes. It is expected for FIFO policy. If you set a FIFO policy, then jobs are executed in the order you submitted them. You have the option to use FAIR policy. In that case, all jobs can be executed sharing fairly available resources and they don't have to wait one after the other. They will still start in the order you submitted, but based on what they do, they may finish in a different order. That assumes your cluster has enough resources and by design you wanted to go that way.

I did not include references to various documents because they were already provided and are widely available.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here