Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark jobs are stuck under YARN Fair Scheduler

avatar
Contributor

Hi, 

 

I have setup YARN Fair-scheduler in Ambari (HDP 3.1.0.0-78) for "Default" queue itself. So far, I haven't added any new queues. 

 

Now, I want to run a simple job against the queue and when I submit the job, the application state is in "ACCEPTED" state forever.  I get the below message in YARN logs:

 

The additional information is given below.  Please help me in fixing this issue at the earliest. 

 

YARN_AM_Message.PNG

For "default" queue, the below parameters are set through "fair-scheduler.xml".  

 

fair_scheduler_screenshot_2.PNG

Also, no jobs are currently running, apart from the one that I have launched. 

 

yarn_job_status.PNG

Given below is the screenshot, which confirms that the maximum AM resource percent is greater than 0.1 

Scheduler_AM_Percent.PNG

 

12 REPLIES 12

avatar
Super Guru
@ssk26 ,

Can you please check the value of below settings?

yarn.app.mapreduce.am.resource.mb
yarn.app.mapreduce.am.resource.cpu-vcores

Cheers
Eric

avatar
Contributor

Hi @EricL

 

Thanks for your inputs. 

 

The value of yarn.app.mapreduce.am.resource.mb is set to 1024 in "mapred-site.xml" file. 

 

I was not able to find the value of "yarn.app.mapreduce.am.resource.cpu-vcores" in any of the XML files (i.e core-site.xml, mapred-site.xml, yarn-site.xml, capacity-scheduler.xml etc..) 

 

<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
</property>

 

Here is the progress that I have made so far:

 

- After setting the Yarn fair scheduler, I did set the Spark program to use Fair scheduling pool also (from default Spark fair scheduler XML template) 

- The minimum and maximum allocation (in MB) for Fair Scheduler in Yarn is set to 1024 MB and 3072 MB respectively. 

- After running a single Spark job [with both Driver and Executor memory set to 512MB], I was able to verify that the job is running. But, it was consuming the entire 3 GB memory.  

- So, the next Spark job is not running at all, as it is waiting for the memory. 

- But, if I revert back the YARN scheduling to "Capacity Scheduler", then with the same memory settings, both the jobs are running fine without any issues. 

 

So, what additional memory related parameters need to be set in Fair Scheduling for the jobs to run properly? 

 

Please help me in fixing this issue. 

 

avatar
Contributor

Hi @EricL ,

 

This is just a gentle reminder.

 

Can you please help me in fixing this issue? 

 

Thanks and Regards,

Sudhindra

avatar
Contributor

Hi @EricL , 

 

I am still facing the same issue when I use YARN fair scheduler to run the Spark jobs. 

 

With the same memory configuration, the Spark jobs are running fine when YARN Capacity Scheduler is used. 

 

Can you please help me in fixing this issue? 

 

Thanks and Regards,

Sudhindra

 

 

avatar
Super Guru
@ssk26,

Have you tried to increase yarn.app.mapreduce.am.resource.mb? Default value is 1GB, and I see from your screenshot that requested 1GB exceeds AM's limit.

Cheers
Eric

avatar
Contributor

Hi @EricL ,

 

I did change the parameter "yarn.app.mapreduce.am.resource.mb" to 2 GB (2048 MB). 

 

Although the second Spark job is now running fine under "Fair Scheduler" configuration, the tasks under the second Spark job are not getting the required number of resources at all. 

 

[Stage 0:> (0 + 0) / 1]20/01/09 22:58:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/01/09 22:58:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/01/09 22:58:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

 

Here are the important information about the cluster:

1. Number of nodes in the Cluster: 2

2. Total amount of memory of Cluster: 15.28 GB 

    (yarn.nodemanager.resource.memory-mb = 7821 MB

    yarn.app.mapreduce.am.resource.mb = 2048 MB 

    yarn.scheduler.minimum-allocation-mb = 1024 MB 

    yarn.scheduler.maximum-allocation-mb = 3072 MB) 

3. Number of executors set through the program: 5 (spark.num.executors)

4. Number of cores set through the program: 3  (spark.executor.cores)

5. Spark Driver Memory and Spark Executor Memory:  2g each 

 

Please help me in understanding what else is going wrong.  

 

Note:  With the same set of parameters (along with yarn.app.mapreduce.am.resource.mb of 1024 MB), the Spark job run fine when YARN Capacity Scheduler is set. However, it doesn't run when YARN Fair Scheduler is set. So, I want to understand what's going wrong only with Fair Scheduler. 

avatar
Super Guru
Have you tried @lyubomirangelo 's suggestion? He noticed that you have 0vcore configured, not sure if that has any negative impacts.

And you only have 2 nodes in the cluster, meaning only one NodeManager? One master one worker node?

avatar
Contributor

Hi @lyubomirangelo and @EricL , 

 

Sorry for the delayed response.  Thanks for your inputs. 

 

I have already changed the number of vcores. But, I am still facing the same issue. 

 

In the meantime, I was able to execute the jobs with YARN Capacity scheduler (with the same memory configuration). So, I am not sure what's wrong with the settings of YARN Fair Scheduler. 

 

Please suggest if any specific settings are required for YARN Fair Scheduler. 

 

Also, I am still using default queue.  I haven't set a separate Queue for handling fair scheduler. 

 

Thanks and Regards, 

Sudhindra

avatar
Contributor

Hi Sudhinra,

 

Thank you for the update. 

 

Can you share the SparkConf you use for your applications;

 

The following settings should work for small resource apps (Note dynamic allocation is disabled):

conf = (SparkConf().setAppName("simple")
        .set("spark.shuffle.service.enabled", "false")
        .set("spark.dynamicAllocation.enabled", "false")
        .set("spark.cores.max", "1")
        .set("spark.executor.instances","2")
        .set("spark.executor.memory","200m")
        .set("spark.executor.cores","1")

From:

https://stackoverflow.com/questions/44581585/warn-cluster-yarnscheduler-initial-job-has-not-accepted...

 

PS: Share the number of cores available on your nodes, spark.executor.cores should not be higher than number of cores available on each node. Also, are you running spark in cluster or client mode?

 

HTH

 

Best,
Lyubomir