Created on 12-25-2019 10:59 AM - edited 12-25-2019 12:16 PM
I am using a Cloudera quickstart VM 5.13.0.0 to run Spark applications in yarn-client mode. I have allocated 10GB and 3 cores to my Cloudera VM. When I submit the application, the application is ACCEPTED but never moves on to RUNNING. When I try to look for logs using yarn logs -applicationId I do not see anything. Its absolutely blank.
I have looked up this issue on:
I have practically meddled with all the configs that these links see a problem with. I still do not have an answer to my problem which on the face of it looks like the ones in the links above. Here are the config parameters of my cloudera cluster:
mapreduce.map.memory.mb 128M mapreduce.reduce.memory.mb 128M mapreduce.job.heap.memory-mb.ratio 0.8 yarn.nodemanager.resource.memory-mb 1900M yarn.nodemanager.resource.percentage-physical-cpu-limit 100 yarn.nodemanager.resource.cpu-vcores 1 yarn.scheduler.minimum-allocation-mb 1M yarn.scheduler.increment-allocation-mb 100M yarn.scheduler.maximum-allocation-mb 1600M yarn.scheduler.minimum-allocation-vcores 1 yarn.scheduler.increment-allocation-vcores 1 yarn.scheduler.maximum-allocation-vcores 2 yarn.scheduler.fair.continuous-scheduling-enabled unchecked mapreduce.am.max-attempts 1 yarn.resourcemanager.am.max-retries, yarn.resourcemanager.am.max-attempts 1 yarn.app.mapreduce.am.resource.mb 1G yarn.app.mapreduce.am.resource.cpu-vcores 1 ApplicationMaster Java Maximum Heap Size 512M yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler yarn.scheduler.fair.user-as-default-queue unchecked yarn.scheduler.fair.preemption unchecked yarn.scheduler.fair.preemption.cluster-utilization-threshold 0.8 yarn.scheduler.fair.sizebasedweight unchecked Fair Scheduler Allocations (deployed) {"defaultFairSharePreemptionThreshold":null,"defaultFairSharePreemptionTimeout":null,"defaultMinSharePreemptionTimeout":null,"defaultQueueSchedulingPolicy":"drf","queueMaxAMShareDefault":-1.0,"queueMaxAppsDefault":null,"queuePlacementRules":[{"create":true,"name":"specified","queue":null,"rules":null},{"create":null,"name":"nestedUserQueue","queue":null,"rules":[{"create":true,"name":"default","queue":"users","rules":null}]},{"create":null,"name":"default","queue":null,"rules":null}],"queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"root","queues":[{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"default","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null},{"aclAdministerApps":null,"aclSubmitApps":null,"allowPreemptionFrom":null,"fairSharePreemptionThreshold":null,"fairSharePreemptionTimeout":null,"minSharePreemptionTimeout":null,"name":"users","queues":[],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":-1.0,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":"parent"}],"schedulablePropertiesList":[{"impalaDefaultQueryMemLimit":null,"impalaDefaultQueryOptions":null,"impalaMaxMemory":null,"impalaMaxQueuedQueries":null,"impalaMaxRunningQueries":null,"impalaQueueTimeout":null,"maxAMShare":null,"maxChildResources":null,"maxResources":null,"maxRunningApps":null,"minResources":null,"scheduleName":"default","weight":1.0}],"schedulingPolicy":"drf","type":null}],"userMaxAppsDefault":1,"users":[]}
Here is what the queue description looks like:
Likewise, here is the record from the Yarn RM UI. Please note that as you can see from the screenshot, the cpu and memory have been allocated and Running Containers is showing 1 but the status is still ACCEPTED!!!
Here is the Application Summary:
Here are the application logs (empty):
And, lastly, here is what the driver sees:
enter code here19/12/26 00:16:42 INFO Client: client token: N/A diagnostics: Application application_1577297544619_0002 failed 1 times due to AM Container for appattempt_1577297544619_0002_000001 exited with exitCode: 10 For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1577297544619_0002/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1577297544619_0002_01_000001 Exit code: 10 Stack trace: ExitCodeException exitCode=10: at org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at org.apache.hadoop.util.Shell.run(Shell.java:507) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 10 Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.default start time: 1577299469533 final status: FAILED tracking URL: http://quickstart.cloudera:8088/cluster/app/application_1577297544619_0002 user: shepanch 19/12/26 00:16:42 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:165) at org.apache.spark.SparkContext.<init>(SparkContext.scala:512) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2511) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909) at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901) at cloudera.jobs.ClouderaSampleJob$.delayedEndpoint$cloudera$jobs$ClouderaSampleJob$1(ClouderaSampleJob.scala:17) at cloudera.jobs.ClouderaSampleJob$delayedInit$body.apply(ClouderaSampleJob.scala:6) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at cloudera.jobs.ClouderaSampleJob$.main(ClouderaSampleJob.scala:6) at cloudera.jobs.ClouderaSampleJob.main(ClouderaSampleJob.scala)
Is there anything that can be done to solve this issue?
Created 12-27-2019 04:40 AM
Hi
In order to check available Resource after allocation of container to your application with below grep command on Resource Manager logs
{{
grep "Released container" RM_logs_file | grep "1577297544619_0002" | grep "> available"
}}
Created 12-28-2019 02:43 AM
There is a great YARN tuning spreadsheet here that will help you calculate correctly your YARN settings. It applies to YARN clusters only, and describes how to tune and optimize YARN for your cluster
Please revert