I have setup YARN Fair-scheduler in Ambari (HDP 126.96.36.199-78) for "Default" queue itself. So far, I haven't added any new queues.
Now, I want to run a simple job against the queue and when I submit the job, the application state is in "ACCEPTED" state forever. I get the below message in YARN logs:
The additional information is given below. Please help me in fixing this issue at the earliest.
For "default" queue, the below parameters are set through "fair-scheduler.xml".
Also, no jobs are currently running, apart from the one that I have launched.
Given below is the screenshot, which confirms that the maximum AM resource percent is greater than 0.1
Thanks for your inputs.
The value of yarn.app.mapreduce.am.resource.mb is set to 1024 in "mapred-site.xml" file.
I was not able to find the value of "yarn.app.mapreduce.am.resource.cpu-vcores" in any of the XML files (i.e core-site.xml, mapred-site.xml, yarn-site.xml, capacity-scheduler.xml etc..)
Here is the progress that I have made so far:
- After setting the Yarn fair scheduler, I did set the Spark program to use Fair scheduling pool also (from default Spark fair scheduler XML template)
- The minimum and maximum allocation (in MB) for Fair Scheduler in Yarn is set to 1024 MB and 3072 MB respectively.
- After running a single Spark job [with both Driver and Executor memory set to 512MB], I was able to verify that the job is running. But, it was consuming the entire 3 GB memory.
- So, the next Spark job is not running at all, as it is waiting for the memory.
- But, if I revert back the YARN scheduling to "Capacity Scheduler", then with the same memory settings, both the jobs are running fine without any issues.
So, what additional memory related parameters need to be set in Fair Scheduling for the jobs to run properly?
Please help me in fixing this issue.
Hi @EricL ,
I am still facing the same issue when I use YARN fair scheduler to run the Spark jobs.
With the same memory configuration, the Spark jobs are running fine when YARN Capacity Scheduler is used.
Can you please help me in fixing this issue?
Thanks and Regards,
Hi @EricL ,
I did change the parameter "yarn.app.mapreduce.am.resource.mb" to 2 GB (2048 MB).
Although the second Spark job is now running fine under "Fair Scheduler" configuration, the tasks under the second Spark job are not getting the required number of resources at all.
[Stage 0:> (0 + 0) / 1]20/01/09 22:58:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/01/09 22:58:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
20/01/09 22:58:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Here are the important information about the cluster:
1. Number of nodes in the Cluster: 2
2. Total amount of memory of Cluster: 15.28 GB
(yarn.nodemanager.resource.memory-mb = 7821 MB
yarn.app.mapreduce.am.resource.mb = 2048 MB
yarn.scheduler.minimum-allocation-mb = 1024 MB
yarn.scheduler.maximum-allocation-mb = 3072 MB)
3. Number of executors set through the program: 5 (spark.num.executors)
4. Number of cores set through the program: 3 (spark.executor.cores)
5. Spark Driver Memory and Spark Executor Memory: 2g each
Please help me in understanding what else is going wrong.
Note: With the same set of parameters (along with yarn.app.mapreduce.am.resource.mb of 1024 MB), the Spark job run fine when YARN Capacity Scheduler is set. However, it doesn't run when YARN Fair Scheduler is set. So, I want to understand what's going wrong only with Fair Scheduler.
Sorry for the delayed response. Thanks for your inputs.
I have already changed the number of vcores. But, I am still facing the same issue.
In the meantime, I was able to execute the jobs with YARN Capacity scheduler (with the same memory configuration). So, I am not sure what's wrong with the settings of YARN Fair Scheduler.
Please suggest if any specific settings are required for YARN Fair Scheduler.
Also, I am still using default queue. I haven't set a separate Queue for handling fair scheduler.
Thanks and Regards,
Thank you for the update.
Can you share the SparkConf you use for your applications;
The following settings should work for small resource apps (Note dynamic allocation is disabled):
conf = (SparkConf().setAppName("simple") .set("spark.shuffle.service.enabled", "false") .set("spark.dynamicAllocation.enabled", "false") .set("spark.cores.max", "1") .set("spark.executor.instances","2") .set("spark.executor.memory","200m") .set("spark.executor.cores","1")
PS: Share the number of cores available on your nodes, spark.executor.cores should not be higher than number of cores available on each node. Also, are you running spark in cluster or client mode?
Please take a look at:
What type of FairScheduler are you using:
What is the weight of the default queue you are submitting your apps to?
From my perspictive you are limiting your default queue to use at minimum 1024MB 0vCores and at maximum 8196MB 0vCores. In both cases no cores are set - when you try to run a job it requires to run with 1024MB memory and 1vCores - it then fails to allocate the 1vCore due to 0vCore min/max restriction and it sends 'exceeds maximum AM resources allowed'
That's why I think the issue is with the core utilization and not with memory.
In your screenshot the
Is set to 8192 mb, 0vcore
And your job requires at least 1vcore as seen in the Diagnostics section.
Please try increasing the vcore size in <queueMaxResourcesDefault> and try to run the job again.