Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Spark2.1 in 5.11 can't start a yarn cluster job.

SOLVED Go to solution

Spark2.1 in 5.11 can't start a yarn cluster job.

New Contributor

My start script:


#!/bin/bash

set -ux

class=com.palmaplus.amapdata.Main


spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 12 \
--driver-cores 1 \
--executor-cores 1 \
--driver-memory 4G \
--executor-memory 2G \
--conf spark.default.parallelism=24 \
--conf spark.shuffle.compress=false \
--conf spark.storage.memoryFraction=0.2 \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/hadoop-yarn" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/hadoop-yarn" \
--class $class \
--files "/data/projects/superdb/conf/application.conf,/data/projects/superdb/conf/brand-blacklist.txt" \
/data/projects/superdb/jar/amap-data-1.0-SNAPSHOT-jar-with-dependencies.jar \

exit 0

 

=====================================console log below===============================

 

17/05/16 14:37:11 INFO yarn.Client: Application report for application_1494307194668_0014 (state: ACCEPTED)
17/05/16 14:37:12 INFO yarn.Client: Application report for application_1494307194668_0014 (state: ACCEPTED)
17/05/16 14:37:13 INFO yarn.Client: Application report for application_1494307194668_0014 (state: FAILED)
17/05/16 14:37:13 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1494307194668_0014 failed 2 times due to AM Container for appattempt_1494307194668_0014_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://master:8088/proxy/application_1494307194668_0014/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1494307194668_0014_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
at org.apache.hadoop.util.Shell.run(Shell.java:504)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.hdfs
start time: 1494916610862
final status: FAILED
tracking URL: http://master:8088/cluster/app/application_1494307194668_0014
user: hdfs
Exception in thread "main" org.apache.spark.SparkException: Application application_1494307194668_0014 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/05/16 14:37:13 INFO util.ShutdownHookManager: Shutdown hook called
17/05/16 14:37:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d231fd6a-4752-4dc1-b041-080756d9c5aa
+ exit 0

 

================yarn logs -applicationId application_1494307194668_0014=========================

 

 

17/05/16 14:39:14 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.25.5:8032


Container: container_1494307194668_0014_02_000001 on slave2_8041
==================================================================
LogType:stderr
Log Upload Time:Tue May 16 14:37:14 +0800 2017
LogLength:1742
Log Contents:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/yarn/nm/usercache/hdfs/filecache/43/__spark_libs__8927180205054833354.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for TERM
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for HUP
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for INT
17/05/16 14:37:12 INFO ApplicationMaster: Preparing Local resources
17/05/16 14:37:12 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1494307194668_0014_000002
17/05/16 14:37:12 INFO SecurityManager: Changing view acls to: yarn,hdfs
17/05/16 14:37:12 INFO SecurityManager: Changing modify acls to: yarn,hdfs
17/05/16 14:37:12 INFO SecurityManager: Changing view acls groups to:
17/05/16 14:37:12 INFO SecurityManager: Changing modify acls groups to:
17/05/16 14:37:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
17/05/16 14:37:12 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/16 14:37:12 INFO ApplicationMaster: Waiting for spark context initialization...

LogType:stdout
Log Upload Time:Tue May 16 14:37:14 +0800 2017
LogLength:0
Log Contents:

 

Container: container_1494307194668_0014_01_000001 on slave3_8041
==================================================================
LogType:stderr
Log Upload Time:Tue May 16 14:37:13 +0800 2017
LogLength:1742
Log Contents:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/yarn/nm/usercache/hdfs/filecache/44/__spark_libs__8927180205054833354.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for TERM
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for HUP
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for INT
17/05/16 14:37:06 INFO ApplicationMaster: Preparing Local resources
17/05/16 14:37:07 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1494307194668_0014_000001
17/05/16 14:37:07 INFO SecurityManager: Changing view acls to: yarn,hdfs
17/05/16 14:37:07 INFO SecurityManager: Changing modify acls to: yarn,hdfs
17/05/16 14:37:07 INFO SecurityManager: Changing view acls groups to:
17/05/16 14:37:07 INFO SecurityManager: Changing modify acls groups to:
17/05/16 14:37:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
17/05/16 14:37:07 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/16 14:37:07 INFO ApplicationMaster: Waiting for spark context initialization...

LogType:stdout
Log Upload Time:Tue May 16 14:37:13 +0800 2017
LogLength:0
Log Contents:

 

Any help plz.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Spark2.1 in 5.11 can't start a yarn cluster job.

Master Collaborator

This is some error caused by your app, rather than a Spark issue. You need to find the executor logs from the app and see what happened.

2 REPLIES 2

Re: Spark2.1 in 5.11 can't start a yarn cluster job.

Master Collaborator

This is some error caused by your app, rather than a Spark issue. You need to find the executor logs from the app and see what happened.

Highlighted

Re: Spark2.1 in 5.11 can't start a yarn cluster job.

New Contributor

You're right, the reason is that I didn't initialize a SparkContext until receiving a message from kafka.