Created on 10-07-2014 11:20 PM - edited 09-16-2022 02:09 AM
When I execute the following in yarn-client mode its working fine and giving the result properly, but when i try to run in Yarn-cluster mode i am getting error
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /home/abc/spark/examples/lib/spark-examples_2.10-1.0.0-cdh5.1.0.jar 10
The above code works fine, but when i execute the same code in yarn cluster mode i amgetting the following error.
14/10/07 09:40:24 INFO Client: Application report from ASM: application identifier: application_1412117173893_1150 appId: 1150 clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service: } appDiagnostics: appMasterHost: N/A appQueue: root.default appMasterRpcPort: -1 appStartTime: 1412689195537 yarnAppState: ACCEPTED distributedFinalState: UNDEFINED appTrackingUrl: http://spark.abcd.com:8088/proxy/application_1412117173893_1150/ appUser: abc 14/10/07 09:40:25 INFO Client: Application report from ASM: application identifier: application_1412117173893_1150 appId: 1150 clientToAMToken: null appDiagnostics: Application application_1412117173893_1150 failed 2 times due to AM Container for appattempt_1412117173893_1150_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:511) at org.apache.hadoop.util.Shell.run(Shell.java:424) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) main : command provided 1 main : user is abc main : requested yarn user is abc Container exited with a non-zero exit code 1 .Failing this attempt.. Failing the application. appMasterHost: N/A appQueue: root.default appMasterRpcPort: -1 appStartTime: 1412689195537 yarnAppState: FAILED distributedFinalState: FAILED appTrackingUrl: spark.abcd.com:8088/cluster/app/application_1412117173893_1150 appUser: abc
Where may be the problem? sometimes when i try to execute in yarn-cluster mode i am getting the following , but i dint see any result
14/10/08 01:51:57 INFO Client: Application report from ASM: application identifier: application_1412117173893_1442 appId: 1442 clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service: } appDiagnostics: appMasterHost: spark.abcd.com appQueue: root.default appMasterRpcPort: 0 appStartTime: 1412747485673 yarnAppState: FINISHED distributedFinalState: SUCCEEDED appTrackingUrl: http://spark.abcd.com:8088/proxy/application_1412117173893_1442/A appUser: abc
Thanks
Created 11-24-2014 12:17 AM
I met the same issue now. Have you find out the way to solve it?
Created 12-23-2014 01:48 AM
try the below command to get the detailed log.
$HADOOP_HOME/bin/yarn logs -applicationId application_1419229907721_0010
Created 06-18-2015 12:54 AM
You are not specifying the jar that contains that class (the examples jar).
It could be that the jar is included automatically in local mode but not in the yarn classpath.
Have a loot at the nodemanager log for the node that tried to run it to veryfy if it's a classpath issue
Created 01-29-2016 07:35 PM
Created 01-29-2016 10:35 PM
An exit code -1 means Java crashed, normally that is due to classpath or memory settings
Created 06-12-2016 08:09 AM
How was this fixed?
Created 06-13-2016 04:42 AM
For the application ownload the application logs and check what the error is in the logs:
yarn application -applicationId APPID -appOwner USERID
Check the exit codes of the application and you should be able to tell in a bit more detail what is going on.
Wilfred
Created 06-19-2015 03:24 AM
This error:
main : command provided 1 main : user is abc main : requested yarn user is abc Container exited with a non-zero exit code 1
Looks like the exit code from the linux container executor.
In cluster mode the driver runs inside the same container as the Application Master which makes a difference.
As other people have said already get the logs from the containers by running:
yarn logs -applicationId APPID
Make sure that you run it as the user "abc" (same as the user that executes the spark command).
Wilfred
I am running my spark streaming application using spark-submit on yarn-cluster. When I run it on local mode it is working fine. But when I try to run it on yarn-cluster using spark-submit, it runs for some time and then exits with following execption.
Diagnostics:Exception from container-launch.Container id: container_1435576266959_1208_02_000002Exit code:13Stack trace:ExitCodeException exitCode=13: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Any help will be appreciated.
Created 10-09-2015 12:22 AM
You'll have to view the logs of the YARN node running the executor, it's not very obvious how to see the logs in the YARN console.
If I had to make a wild guess, I would say the user you are running the job with doesn't exit in the node running the executor.
Created 02-09-2017 10:01 PM
hi,ArunShell
I encountered a similar mistake, running spark, the user can not find!Please help me, thank you!
spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn-client \ --executor-memory 1G \ --num-executors 1 \ --num-executors 2 \ --driver-memory 1g \ --executor-cores 1 \ --principal kadmin/admin@NGAA.COM \ --keytab /home/test/sparktest/princpal/sparkjob.keytab \ /opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar 12
error messages:
17/02/10 13:54:16 INFO security.UserGroupInformation: Login successful for user kadmin/admin@NGAA.COM using keytab file /home/test/sparktest/princpal/sparkjob.keytab 17/02/10 13:54:16 INFO spark.SparkContext: Running Spark version 1.6.0 17/02/10 13:54:16 INFO spark.SecurityManager: Changing view acls to: root,kadmin 17/02/10 13:54:16 INFO spark.SecurityManager: Changing modify acls to: root,kadmin 17/02/10 13:54:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin) 17/02/10 13:54:17 INFO util.Utils: Successfully started service 'sparkDriver' on port 56214. 17/02/10 13:54:17 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/02/10 13:54:17 INFO Remoting: Starting remoting 17/02/10 13:54:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.100.51:40936] 17/02/10 13:54:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.10.100.51:40936] 17/02/10 13:54:18 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 40936. 17/02/10 13:54:18 INFO spark.SparkEnv: Registering MapOutputTracker 17/02/10 13:54:18 INFO spark.SparkEnv: Registering BlockManagerMaster 17/02/10 13:54:18 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-cf37cdde-4eab-4804-b84b-b5f937828aa7 17/02/10 13:54:18 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB 17/02/10 13:54:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/02/10 13:54:19 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/02/10 13:54:19 INFO ui.SparkUI: Started SparkUI at http://10.10.100.51:4040 17/02/10 13:54:19 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar at spark://10.10.100.51:56214/jars/spark-examples.jar with timestamp 1486706059601 17/02/10 13:54:19 INFO yarn.Client: Attempting to login to the Kerberos using principal: kadmin/admin@NGAA.COM and keytab: /home/test/sparktest/princpal/sparkjob.keytab 17/02/10 13:54:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/10.10.100.51:8032 17/02/10 13:54:20 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers 17/02/10 13:54:20 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 17/02/10 13:54:20 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 17/02/10 13:54:20 INFO yarn.Client: Setting up container launch context for our AM 17/02/10 13:54:20 INFO yarn.Client: Setting up the launch environment for our AM container 17/02/10 13:54:21 INFO yarn.Client: Credentials file set to: credentials-79afe260-414b-4df7-8242-3cd1a279dbc7 17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002 17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 44 for kadmin on 10.10.100.52:8020 17/02/10 13:54:21 INFO yarn.Client: Renewal Interval set to 86400061 17/02/10 13:54:21 INFO yarn.Client: Preparing resources for our AM container 17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002 17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 45 for kadmin on 10.10.100.52:8020 17/02/10 13:54:22 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop1:9083 17/02/10 13:54:22 INFO hive.metastore: Opened a connection to metastore, current connections: 1 17/02/10 13:54:22 INFO hive.metastore: Connected to metastore. 17/02/10 13:54:22 INFO hive.metastore: Closed a connection to metastore, current connections: 0 17/02/10 13:54:23 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache. 17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/home/test/sparktest/princpal/sparkjob.keytab -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/sparkjob.keytab 17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/__spark_conf__4615276915023723512.zip -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/__spark_conf__4615276915023723512.zip 17/02/10 13:54:23 INFO spark.SecurityManager: Changing view acls to: root,kadmin 17/02/10 13:54:23 INFO spark.SecurityManager: Changing modify acls to: root,kadmin 17/02/10 13:54:23 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin) 17/02/10 13:54:23 INFO yarn.Client: Submitting application 2 to ResourceManager 17/02/10 13:54:23 INFO impl.YarnClientImpl: Submitted application application_1486705141135_0002 17/02/10 13:54:24 INFO yarn.Client: Application report for application_1486705141135_0002 (state: FAILED) 17/02/10 13:54:24 INFO yarn.Client: client token: N/A diagnostics: Application application_1486705141135_0002 failed 2 times due to AM Container for appattempt_1486705141135_0002_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hadoop1:8088/proxy/application_1486705141135_0002/Then, click on links to logs of each attempt. Diagnostics: Application application_1486705141135_0002 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is kadmin main : requested yarn user is kadmin User kadmin not found Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.kadmin start time: 1486706063635 final status: FAILED tracking URL: http://hadoop1:8088/cluster/app/application_1486705141135_0002 user: kadmin 17/02/10 13:54:24 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1486705141135_0002 17/02/10 13:54:24 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:541) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/02/10 13:54:25 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.100.51:4040 17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Stopped 17/02/10 13:54:25 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/02/10 13:54:25 ERROR util.Utils: Uncaught exception in thread main java.lang.NullPointerException at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152) at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1231) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96) at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1767) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1230) at org.apache.spark.SparkContext.stop(SparkContext.scala:1766) at org.apache.spark.SparkContext.<init>(SparkContext.scala:613) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/02/10 13:54:25 INFO spark.SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:541) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/02/10 13:54:25 INFO storage.DiskBlockManager: Shutdown hook called 17/02/10 13:54:25 INFO util.ShutdownHookManager: Shutdown hook called 17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/userFiles-58912a50-d060-42ec-8665-7a74c1be9a7b 17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128
key point:
main : run as user is kadmin main : requested yarn user is kadmin User kadmin not found
Thanks
Created 02-10-2017 12:32 AM
In your case, I think the issue is that the "kadmin" user doesn't exist in linux (or at least not in all nodes)
Created 02-13-2017 08:10 PM
Thank you, you are right, when I create a kadmin user on each linux machine, you can successfully submit the task!