Reply
New Contributor
Posts: 2
Registered: ‎05-15-2017

Spark2.1 in 5.11 can't start a yarn cluster job.

My start script:


#!/bin/bash

set -ux

class=com.palmaplus.amapdata.Main


spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 12 \
--driver-cores 1 \
--executor-cores 1 \
--driver-memory 4G \
--executor-memory 2G \
--conf spark.default.parallelism=24 \
--conf spark.shuffle.compress=false \
--conf spark.storage.memoryFraction=0.2 \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/hadoop-yarn" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/hadoop-yarn" \
--class $class \
--files "/data/projects/superdb/conf/application.conf,/data/projects/superdb/conf/brand-blacklist.txt" \
/data/projects/superdb/jar/amap-data-1.0-SNAPSHOT-jar-with-dependencies.jar \

exit 0

 

=====================================console log below===============================

 

17/05/16 14:37:11 INFO yarn.Client: Application report for application_1494307194668_0014 (state: ACCEPTED)
17/05/16 14:37:12 INFO yarn.Client: Application report for application_1494307194668_0014 (state: ACCEPTED)
17/05/16 14:37:13 INFO yarn.Client: Application report for application_1494307194668_0014 (state: FAILED)
17/05/16 14:37:13 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1494307194668_0014 failed 2 times due to AM Container for appattempt_1494307194668_0014_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://master:8088/proxy/application_1494307194668_0014/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1494307194668_0014_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
at org.apache.hadoop.util.Shell.run(Shell.java:504)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.hdfs
start time: 1494916610862
final status: FAILED
tracking URL: http://master:8088/cluster/app/application_1494307194668_0014
user: hdfs
Exception in thread "main" org.apache.spark.SparkException: Application application_1494307194668_0014 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/05/16 14:37:13 INFO util.ShutdownHookManager: Shutdown hook called
17/05/16 14:37:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d231fd6a-4752-4dc1-b041-080756d9c5aa
+ exit 0

 

================yarn logs -applicationId application_1494307194668_0014=========================

 

 

17/05/16 14:39:14 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.25.5:8032


Container: container_1494307194668_0014_02_000001 on slave2_8041
==================================================================
LogType:stderr
Log Upload Time:Tue May 16 14:37:14 +0800 2017
LogLength:1742
Log Contents:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/yarn/nm/usercache/hdfs/filecache/43/__spark_libs__8927180205054833354.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for TERM
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for HUP
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for INT
17/05/16 14:37:12 INFO ApplicationMaster: Preparing Local resources
17/05/16 14:37:12 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1494307194668_0014_000002
17/05/16 14:37:12 INFO SecurityManager: Changing view acls to: yarn,hdfs
17/05/16 14:37:12 INFO SecurityManager: Changing modify acls to: yarn,hdfs
17/05/16 14:37:12 INFO SecurityManager: Changing view acls groups to:
17/05/16 14:37:12 INFO SecurityManager: Changing modify acls groups to:
17/05/16 14:37:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
17/05/16 14:37:12 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/16 14:37:12 INFO ApplicationMaster: Waiting for spark context initialization...

LogType:stdout
Log Upload Time:Tue May 16 14:37:14 +0800 2017
LogLength:0
Log Contents:

 

Container: container_1494307194668_0014_01_000001 on slave3_8041
==================================================================
LogType:stderr
Log Upload Time:Tue May 16 14:37:13 +0800 2017
LogLength:1742
Log Contents:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/yarn/nm/usercache/hdfs/filecache/44/__spark_libs__8927180205054833354.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for TERM
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for HUP
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for INT
17/05/16 14:37:06 INFO ApplicationMaster: Preparing Local resources
17/05/16 14:37:07 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1494307194668_0014_000001
17/05/16 14:37:07 INFO SecurityManager: Changing view acls to: yarn,hdfs
17/05/16 14:37:07 INFO SecurityManager: Changing modify acls to: yarn,hdfs
17/05/16 14:37:07 INFO SecurityManager: Changing view acls groups to:
17/05/16 14:37:07 INFO SecurityManager: Changing modify acls groups to:
17/05/16 14:37:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
17/05/16 14:37:07 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/16 14:37:07 INFO ApplicationMaster: Waiting for spark context initialization...

LogType:stdout
Log Upload Time:Tue May 16 14:37:13 +0800 2017
LogLength:0
Log Contents:

 

Any help plz.

Cloudera Employee
Posts: 418
Registered: ‎08-11-2014

Re: Spark2.1 in 5.11 can't start a yarn cluster job.

This is some error caused by your app, rather than a Spark issue. You need to find the executor logs from the app and see what happened.

Highlighted
New Contributor
Posts: 2
Registered: ‎05-15-2017

Re: Spark2.1 in 5.11 can't start a yarn cluster job.

You're right, the reason is that I didn't initialize a SparkContext until receiving a message from kafka.

Announcements