Support Questions

Find answers, ask questions, and share your expertise

Issue running spark application in Yarn-cluster mode

avatar
Explorer

CDH 5.4

 

I am running my spark streaming application using spark-submit on yarn-cluster. When I run it on local mode it is working fine. But when I try to run it on yarn-cluster using spark-submit, it runs for some time and then exits with following execption

 

$spark-submit --master yarn --deploy-mode cluster network_wordcount.py

 

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 21:59:34 INFO RMProxy: Connecting to ResourceManager at is-hadoop1nb.gwl.com/143.199.102.180:8032
16/08/30 21:59:34 INFO Client: Requesting a new application from cluster with 3 NodeManagers
16/08/30 21:59:34 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (29696 MB per container)
16/08/30 21:59:34 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/30 21:59:34 INFO Client: Setting up container launch context for our AM
16/08/30 21:59:34 INFO Client: Preparing resources for our AM container
16/08/30 21:59:35 INFO Client: Uploading resource file:/home/hdadmin/junk/network_wordcount.py -> hdfs://is-hadoop2nb.gwl.com:8020/user/hdadmin/.sparkStaging/application_1472577335755_0020/network_wordcount.py
16/08/30 21:59:35 INFO Client: Setting up the launch environment for our AM container
16/08/30 21:59:35 INFO SecurityManager: Changing view acls to: hdadmin
16/08/30 21:59:35 INFO SecurityManager: Changing modify acls to: hdadmin
16/08/30 21:59:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdadmin); users with modify permissions: Set(hdadmin)
16/08/30 21:59:35 INFO Client: Submitting application 20 to ResourceManager
16/08/30 21:59:35 INFO YarnClientImpl: Submitted application application_1472577335755_0020
16/08/30 21:59:36 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:36 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdadmin
start time: 1472615975456
final status: UNDEFINED
tracking URL: http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/
user: hdadmin
16/08/30 21:59:37 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:38 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:39 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:40 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:41 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:42 INFO Client: Application report for application_1472577335755_0020 (state: FAILED)
16/08/30 21:59:42 INFO Client:
client token: N/A
diagnostics: Application application_1472577335755_0020 failed 2 times due to AM Container for appattempt_1472577335755_0020_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472577335755_0020_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdadmin
start time: 1472615975456
final status: FAILED
tracking URL: http://is-hadoop1nb.gwl.com:8088/cluster/app/application_1472577335755_0020
user: hdadmin
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:656)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:681)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

Diagnostics:

Application application_1472577335755_0020 failed 2 times due to AM Container for appattempt_1472577335755_0020_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472577335755_0020_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
 
 
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.

 

 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

 

Log Type: stderr

Log Upload Time: Tue Aug 30 22:10:01 -0600 2016

Log Length: 1803

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 22:09:54 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/08/30 22:09:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1472577335755_0022_000001
16/08/30 22:09:55 INFO spark.SecurityManager: Changing view acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: Changing modify acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdadmin); users with modify permissions: Set(yarn, hdadmin)
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1472577335755_0022
 

Log Type: stdout

Log Upload Time: Tue Aug 30 22:10:01 -0600 2016

Log Length: 176

 

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 22:09:54 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
16/08/30 22:09:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1472577335755_0022_000001
16/08/30 22:09:55 INFO spark.SecurityManager: Changing view acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: Changing modify acls to: yarn,hdadmin
16/08/30 22:09:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdadmin); users with modify permissions: Set(yarn, hdadmin)
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization
16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
16/08/30 22:09:56 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1472577335755_0022

 

 

Traceback (most recent call last):
  File "network_wordcount.py", line 6, in <module>
    sc = SparkContext(master, "NetworkWordCount")
NameError: name 'master' is not defined

 

Any clue?  

 

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

No, I've never seen such a variable defined by Spark. You can probably look up "spark.master" in the SparkConf. But you don't need to query it in order to make a SparkContext in your app. It looks like you might have modified a standard Spark example, in which case just undo those changes.

View solution in original post

4 REPLIES 4

avatar
Rising Star

The valuable information is at very bottom: 

NameError: name 'master' is not defined

 

Please make sure you have defined variable "master" in your code. Or if you are specifying master via spark-submit, you should not set it in code.

avatar
Explorer

Thanks Umesh.  

 

Doesnt it ('master') get defined when executed with a command line like below?

 

$spark-submit --master yarn --deploy-mode cluster network_wordcount.py

 

Reason i dont have defined variable (master) in the code.

 

 

 

 

 

 

avatar
Master Collaborator

No, I've never seen such a variable defined by Spark. You can probably look up "spark.master" in the SparkConf. But you don't need to query it in order to make a SparkContext in your app. It looks like you might have modified a standard Spark example, in which case just undo those changes.

avatar
Explorer

Thanks, it did fix the issue.   Got rid of the variable from the code,  I am able to execute it in cluster mode.