Created on 08-30-2016 09:22 PM - edited 09-16-2022 03:37 AM
CDH 5.4
I am running my spark streaming application using spark-submit on yarn-cluster. When I run it on local mode it is working fine. But when I try to run it on yarn-cluster using spark-submit, it runs for some time and then exits with following execption
$spark-submit --master yarn --deploy-mode cluster network_wordcount.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/30 21:59:34 INFO RMProxy: Connecting to ResourceManager at is-hadoop1nb.gwl.com/143.199.102.180:8032
16/08/30 21:59:34 INFO Client: Requesting a new application from cluster with 3 NodeManagers
16/08/30 21:59:34 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (29696 MB per container)
16/08/30 21:59:34 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/30 21:59:34 INFO Client: Setting up container launch context for our AM
16/08/30 21:59:34 INFO Client: Preparing resources for our AM container
16/08/30 21:59:35 INFO Client: Uploading resource file:/home/hdadmin/junk/network_wordcount.py -> hdfs://is-hadoop2nb.gwl.com:8020/user/hdadmin/.sparkStaging/application_1472577335755_0020/network_wordcount.py
16/08/30 21:59:35 INFO Client: Setting up the launch environment for our AM container
16/08/30 21:59:35 INFO SecurityManager: Changing view acls to: hdadmin
16/08/30 21:59:35 INFO SecurityManager: Changing modify acls to: hdadmin
16/08/30 21:59:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdadmin); users with modify permissions: Set(hdadmin)
16/08/30 21:59:35 INFO Client: Submitting application 20 to ResourceManager
16/08/30 21:59:35 INFO YarnClientImpl: Submitted application application_1472577335755_0020
16/08/30 21:59:36 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:36 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdadmin
start time: 1472615975456
final status: UNDEFINED
tracking URL: http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/
user: hdadmin
16/08/30 21:59:37 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:38 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:39 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:40 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:41 INFO Client: Application report for application_1472577335755_0020 (state: ACCEPTED)
16/08/30 21:59:42 INFO Client: Application report for application_1472577335755_0020 (state: FAILED)
16/08/30 21:59:42 INFO Client:
client token: N/A
diagnostics: Application application_1472577335755_0020 failed 2 times due to AM Container for appattempt_1472577335755_0020_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472577335755_0020_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdadmin
start time: 1472615975456
final status: FAILED
tracking URL: http://is-hadoop1nb.gwl.com:8088/cluster/app/application_1472577335755_0020
user: hdadmin
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:656)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:681)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Diagnostics:
Application application_1472577335755_0020 failed 2 times due to AM Container for appattempt_1472577335755_0020_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://is-hadoop1nb.gwl.com:8088/proxy/application_1472577335755_0020/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1472577335755_0020_02_000001 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) at org.apache.hadoop.util.Shell.run(Shell.java:460) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. |
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Log Type: stderr
Log Upload Time: Tue Aug 30 22:10:01 -0600 2016
Log Length: 1803
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/08/30 22:09:54 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 16/08/30 22:09:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1472577335755_0022_000001 16/08/30 22:09:55 INFO spark.SecurityManager: Changing view acls to: yarn,hdadmin 16/08/30 22:09:55 INFO spark.SecurityManager: Changing modify acls to: yarn,hdadmin 16/08/30 22:09:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdadmin); users with modify permissions: Set(yarn, hdadmin) 16/08/30 22:09:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization 16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 16/08/30 22:09:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) 16/08/30 22:09:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.) 16/08/30 22:09:56 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1472577335755_0022
Log Type: stdout
Log Upload Time: Tue Aug 30 22:10:01 -0600 2016
Log Length: 176
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/jars/avro-tools-1.7.6-cdh5.4.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/08/30 22:09:54 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 16/08/30 22:09:55 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1472577335755_0022_000001 16/08/30 22:09:55 INFO spark.SecurityManager: Changing view acls to: yarn,hdadmin 16/08/30 22:09:55 INFO spark.SecurityManager: Changing modify acls to: yarn,hdadmin 16/08/30 22:09:55 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdadmin); users with modify permissions: Set(yarn, hdadmin) 16/08/30 22:09:55 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization 16/08/30 22:09:55 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 16/08/30 22:09:56 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) 16/08/30 22:09:56 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.) 16/08/30 22:09:56 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1472577335755_0022
Traceback (most recent call last): File "network_wordcount.py", line 6, in <module> sc = SparkContext(master, "NetworkWordCount") NameError: name 'master' is not defined
Any clue?
Created 08-31-2016 06:47 AM
No, I've never seen such a variable defined by Spark. You can probably look up "spark.master" in the SparkConf. But you don't need to query it in order to make a SparkContext in your app. It looks like you might have modified a standard Spark example, in which case just undo those changes.
Created on 08-30-2016 10:04 PM - edited 08-30-2016 10:22 PM
The valuable information is at very bottom:
NameError: name 'master' is not defined
Please make sure you have defined variable "master" in your code. Or if you are specifying master via spark-submit, you should not set it in code.
Created 08-31-2016 06:34 AM
Thanks Umesh.
Doesnt it ('master') get defined when executed with a command line like below?
$spark-submit --master yarn --deploy-mode cluster network_wordcount.py
Reason i dont have defined variable (master) in the code.
Created 08-31-2016 06:47 AM
No, I've never seen such a variable defined by Spark. You can probably look up "spark.master" in the SparkConf. But you don't need to query it in order to make a SparkContext in your app. It looks like you might have modified a standard Spark example, in which case just undo those changes.
Created 08-31-2016 12:20 PM
Thanks, it did fix the issue. Got rid of the variable from the code, I am able to execute it in cluster mode.