Support Questions

ArunShell · ‎10-07-2014

When I execute the following in yarn-client mode its working fine and giving the result properly, but when i try to run in Yarn-cluster mode i am getting error

spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /home/abc/spark/examples/lib/spark-examples_2.10-1.0.0-cdh5.1.0.jar 10

The above code works fine, but when i execute the same code in yarn cluster mode i amgetting the following error.

14/10/07 09:40:24 INFO Client: Application report from ASM:
         application identifier: application_1412117173893_1150
         appId: 1150
         clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service:  }
         appDiagnostics:
         appMasterHost: N/A
         appQueue: root.default
         appMasterRpcPort: -1
         appStartTime: 1412689195537
         yarnAppState: ACCEPTED
         distributedFinalState: UNDEFINED
         appTrackingUrl: http://spark.abcd.com:8088/proxy/application_1412117173893_1150/
         appUser: abc
14/10/07 09:40:25 INFO Client: Application report from ASM:
         application identifier: application_1412117173893_1150
         appId: 1150
         clientToAMToken: null
         appDiagnostics: Application application_1412117173893_1150 failed 2 times due to AM Container for appattempt_1412117173893_1150_000002 exited with  exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
        at org.apache.hadoop.util.Shell.run(Shell.java:424)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

main : command provided 1
main : user is abc
main : requested yarn user is abc

Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
         appMasterHost: N/A
         appQueue: root.default
         appMasterRpcPort: -1
         appStartTime: 1412689195537
         yarnAppState: FAILED
         distributedFinalState: FAILED
         appTrackingUrl: spark.abcd.com:8088/cluster/app/application_1412117173893_1150
         appUser: abc

Where may be the problem? sometimes when i try to execute in yarn-cluster mode i am getting the following , but i dint see any result

14/10/08 01:51:57 INFO Client: Application report from ASM:
         application identifier: application_1412117173893_1442
         appId: 1442
         clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service:  }
         appDiagnostics:
         appMasterHost: spark.abcd.com
         appQueue: root.default
         appMasterRpcPort: 0
         appStartTime: 1412747485673
         yarnAppState: FINISHED
         distributedFinalState: SUCCEEDED
         appTrackingUrl: http://spark.abcd.com:8088/proxy/application_1412117173893_1442/A
         appUser: abc

Thanks

Gonzalo Herreros · ‎10-09-2015

You'll have to view the logs of the YARN node running the executor, it's not very obvious how to see the logs in the YARN console.

If I had to make a wild guess, I would say the user you are running the job with doesn't exit in the node running the executor.

balance002 · ‎02-09-2017

hi,ArunShell

I encountered a similar mistake, running spark, the user can not find!Please help me, thank you!

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-client \
--executor-memory 1G \
--num-executors 1 \
--num-executors 2 \
--driver-memory 1g \
--executor-cores 1 \
--principal kadmin/admin@NGAA.COM \
--keytab   /home/test/sparktest/princpal/sparkjob.keytab \
/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar 12

error messages：

17/02/10 13:54:16 INFO security.UserGroupInformation: Login successful for user kadmin/admin@NGAA.COM using keytab file /home/test/sparktest/princpal/sparkjob.keytab
17/02/10 13:54:16 INFO spark.SparkContext: Running Spark version 1.6.0
17/02/10 13:54:16 INFO spark.SecurityManager: Changing view acls to: root,kadmin
17/02/10 13:54:16 INFO spark.SecurityManager: Changing modify acls to: root,kadmin
17/02/10 13:54:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin)
17/02/10 13:54:17 INFO util.Utils: Successfully started service 'sparkDriver' on port 56214.
17/02/10 13:54:17 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/02/10 13:54:17 INFO Remoting: Starting remoting
17/02/10 13:54:18 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.10.100.51:40936]
17/02/10 13:54:18 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.10.100.51:40936]
17/02/10 13:54:18 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 40936.
17/02/10 13:54:18 INFO spark.SparkEnv: Registering MapOutputTracker
17/02/10 13:54:18 INFO spark.SparkEnv: Registering BlockManagerMaster
17/02/10 13:54:18 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-cf37cdde-4eab-4804-b84b-b5f937828aa7
17/02/10 13:54:18 INFO storage.MemoryStore: MemoryStore started with capacity 530.3 MB
17/02/10 13:54:18 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/02/10 13:54:19 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/02/10 13:54:19 INFO ui.SparkUI: Started SparkUI at http://10.10.100.51:4040
17/02/10 13:54:19 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples.jar at spark://10.10.100.51:56214/jars/spark-examples.jar with timestamp 1486706059601
17/02/10 13:54:19 INFO yarn.Client: Attempting to login to the Kerberos using principal: kadmin/admin@NGAA.COM and keytab: /home/test/sparktest/princpal/sparkjob.keytab
17/02/10 13:54:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/10.10.100.51:8032
17/02/10 13:54:20 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
17/02/10 13:54:20 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/02/10 13:54:20 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/02/10 13:54:20 INFO yarn.Client: Setting up container launch context for our AM
17/02/10 13:54:20 INFO yarn.Client: Setting up the launch environment for our AM container
17/02/10 13:54:21 INFO yarn.Client: Credentials file set to: credentials-79afe260-414b-4df7-8242-3cd1a279dbc7
17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002
17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 44 for kadmin on 10.10.100.52:8020
17/02/10 13:54:21 INFO yarn.Client: Renewal Interval set to 86400061
17/02/10 13:54:21 INFO yarn.Client: Preparing resources for our AM container
17/02/10 13:54:21 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002
17/02/10 13:54:21 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 45 for kadmin on 10.10.100.52:8020
17/02/10 13:54:22 INFO hive.metastore: Trying to connect to metastore with URI thrift://hadoop1:9083
17/02/10 13:54:22 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/02/10 13:54:22 INFO hive.metastore: Connected to metastore.
17/02/10 13:54:22 INFO hive.metastore: Closed a connection to metastore, current connections: 0
17/02/10 13:54:23 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache.
17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/home/test/sparktest/princpal/sparkjob.keytab -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/sparkjob.keytab
17/02/10 13:54:23 INFO yarn.Client: Uploading resource file:/tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/__spark_conf__4615276915023723512.zip -> hdfs://hadoop2:8020/user/kadmin/.sparkStaging/application_1486705141135_0002/__spark_conf__4615276915023723512.zip
17/02/10 13:54:23 INFO spark.SecurityManager: Changing view acls to: root,kadmin
17/02/10 13:54:23 INFO spark.SecurityManager: Changing modify acls to: root,kadmin
17/02/10 13:54:23 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, kadmin); users with modify permissions: Set(root, kadmin)
17/02/10 13:54:23 INFO yarn.Client: Submitting application 2 to ResourceManager
17/02/10 13:54:23 INFO impl.YarnClientImpl: Submitted application application_1486705141135_0002
17/02/10 13:54:24 INFO yarn.Client: Application report for application_1486705141135_0002 (state: FAILED)
17/02/10 13:54:24 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: Application application_1486705141135_0002 failed 2 times due to AM Container for appattempt_1486705141135_0002_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hadoop1:8088/proxy/application_1486705141135_0002/Then, click on links to logs of each attempt.
Diagnostics: Application application_1486705141135_0002 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is kadmin
main : requested yarn user is kadmin
User kadmin not found

Failing this attempt. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.users.kadmin
	 start time: 1486706063635
	 final status: FAILED
	 tracking URL: http://hadoop1:8088/cluster/app/application_1486705141135_0002
	 user: kadmin
17/02/10 13:54:24 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1486705141135_0002
17/02/10 13:54:24 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO ui.SparkUI: Stopped Spark web UI at http://10.10.100.51:4040
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
17/02/10 13:54:25 INFO cluster.YarnClientSchedulerBackend: Stopped
17/02/10 13:54:25 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/02/10 13:54:25 ERROR util.Utils: Uncaught exception in thread main
java.lang.NullPointerException
	at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
	at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1231)
	at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
	at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1767)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1230)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1766)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:613)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO spark.SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
	at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
	at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/10 13:54:25 INFO storage.DiskBlockManager: Shutdown hook called
17/02/10 13:54:25 INFO util.ShutdownHookManager: Shutdown hook called
17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128/userFiles-58912a50-d060-42ec-8665-7a74c1be9a7b
17/02/10 13:54:25 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-79d08367-6f8d-4cb3-813e-d450e90a3128

key point:

main : run as user is kadmin
main : requested yarn user is kadmin
User kadmin not found

Thanks

Gonzalo Herreros · ‎02-10-2017

In your case, I think the issue is that the "kadmin" user doesn't exist in linux (or at least not in all nodes)

balance002 · ‎02-13-2017

Thank you, you are right, when I create a kadmin user on each linux machine, you can successfully submit the task!

Cloudera Community

Support Questions

Issue on running spark application in Yarn-cluster mode