Support Questions
Find answers, ask questions, and share your expertise

Issue running spark application in Yarn-cluster mode

New Contributor

CDH 5.4

 

I am running my spark streaming application using spark-submit on yarn-cluster. When I run it on local mode it is working fine. But when I try to run it on yarn-cluster using spark-submit, it runs for some time and then exits with following execption

 

$spark-submit --master yarn --deploy-mode cluster network_wordcount.py

 

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 21:59:34 INFO RMProxy: Connecting to ResourceManager at is-hadoop1nb.gwl.com/143.199.102.180:8032
16/08/30 21:59:34 INFO Client: Requesting a new application from cluster with 3 NodeManagers
16/08/30 21:59:34 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (29696 MB per container)
16/08/30 21:59:34 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/30 21:59:34 INFO Client: Setting up container launch context for our AM
16/08/30 21:59:34 INFO Client: Preparing resources for our AM container
16/08/30 21:59:35 INFO Client: Uploading resource file:/home/hdadmin/junk/network_wordcount.py -> hdfs://is-hadoop2nb.gwl.com:8020/user/hdadmin/.sparkStaging/application_1472577335755_0020/network_wordcount.py
16/08/30 21:59:35 INFO Client: Setting up the launch environment for our AM container
16/08/30 21:59:35 INFO SecurityManager: Changing view acls to: hdadmin
16/08/30 21:59:35 INFO SecurityManager: Changing modify acls to: hdadmin
16/08/30 21:59:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdadmin); users with modify permissions: Set(hdadmin)
16/08/30 21:59:35 INFO Client: Submitting application 20 to ResourceManager
16/08/30 21:59:35 INFO YarnClientImpl: Submitted application application_1472577335755_0020
16/08/30 21:59:36 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:36 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdadmin
start time: 1472615975456
final status: UNDEFINED
tracking URL: http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/
user: hdadmin
16/08/30 21:59:37 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:38 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:39 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:40 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:41 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:42 INFO Client: Application report for application_1472577335755_0020 (state: FAILED)
16/08/30 21:59:42 INFO Client:
client token: N/A
diagnostics: Application application_1472577335755_0020 failed 2 times due to AM Container for appattempt_1472577335755_0020_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472577335755_0020_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdadmin
start time: 1472615975456
final status: FAILED
tracking URL: http://is-hadoop1nb.gwl.com:8088/cluster/app/application_1472577335755_0020
user: hdadmin
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:656)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:681)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

Diagnostics:

Application application_1472577335755_0020 failed 2 times due to AM Container for appattempt_1472577335755_0020_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472577335755_0020_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
 
 
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.

 

 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

Log Type: stderr

Log Upload Time: Tue Aug 30 22:10:01 -0600 2016

Log Length: 1803

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 22:09:54 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/08/30 22:09:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1472577335755_0022_000001
16/08/30 22:09:55 INFO spark.SecurityManager: Changing view acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: Changing modify acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdadmin); users with modify permissions: Set(yarn, hdadmin)
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1472577335755_0022
 

Log Type: stdout

Log Upload Time: Tue Aug 30 22:10:01 -0600 2016

Log Length: 176

 

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 22:09:54 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/08/30 22:09:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1472577335755_0022_000001
16/08/30 22:09:55 INFO spark.SecurityManager: Changing view acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: Changing modify acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdadmin); users with modify permissions: Set(yarn, hdadmin)
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1472577335755_0022

 

 

Traceback (most recent call last):
  File "network_wordcount.py", line 6, in <module>
    sc = SparkContext(master, "NetworkWordCount")
NameError: name 'master' is not defined

 

Any clue?  

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Master Collaborator

No, I've never seen such a variable defined by Spark. You can probably look up "spark.master" in the SparkConf. But you don't need to query it in order to make a SparkContext in your app. It looks like you might have modified a standard Spark example, in which case just undo those changes.

View solution in original post

4 REPLIES 4

Contributor

The valuable information is at very bottom: 

NameError: name 'master' is not defined

 

Please make sure you have defined variable "master" in your code. Or if you are specifying master via spark-submit, you should not set it in code.

New Contributor

Thanks Umesh.  

 

Doesnt it ('master') get defined when executed with a command line like below?

 

$spark-submit --master yarn --deploy-mode cluster network_wordcount.py

 

Reason i dont have defined variable (master) in the code.

 

 

 

 

 

 

Master Collaborator

No, I've never seen such a variable defined by Spark. You can probably look up "spark.master" in the SparkConf. But you don't need to query it in order to make a SparkContext in your app. It looks like you might have modified a standard Spark example, in which case just undo those changes.

View solution in original post

New Contributor

Thanks, it did fix the issue.   Got rid of the variable from the code,  I am able to execute it in cluster mode.