Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark2.1 in 5.11 can't start a yarn cluster job.

avatar
New Contributor

My start script:


#!/bin/bash

set -ux

class=com.palmaplus.amapdata.Main


spark-submit \
--master yarn \
--deploy-mode cluster \
--num-executors 12 \
--driver-cores 1 \
--executor-cores 1 \
--driver-memory 4G \
--executor-memory 2G \
--conf spark.default.parallelism=24 \
--conf spark.shuffle.compress=false \
--conf spark.storage.memoryFraction=0.2 \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/hadoop-yarn" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark.properties -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/hadoop-yarn" \
--class $class \
--files "/data/projects/superdb/conf/application.conf,/data/projects/superdb/conf/brand-blacklist.txt" \
/data/projects/superdb/jar/amap-data-1.0-SNAPSHOT-jar-with-dependencies.jar \

exit 0

 

=====================================console log below===============================

 

17/05/16 14:37:11 INFO yarn.Client: Application report for application_1494307194668_0014 (state: ACCEPTED)
17/05/16 14:37:12 INFO yarn.Client: Application report for application_1494307194668_0014 (state: ACCEPTED)
17/05/16 14:37:13 INFO yarn.Client: Application report for application_1494307194668_0014 (state: FAILED)
17/05/16 14:37:13 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1494307194668_0014 failed 2 times due to AM Container for appattempt_1494307194668_0014_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://master:8088/proxy/application_1494307194668_0014/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1494307194668_0014_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
at org.apache.hadoop.util.Shell.run(Shell.java:504)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.users.hdfs
start time: 1494916610862
final status: FAILED
tracking URL: http://master:8088/cluster/app/application_1494307194668_0014
user: hdfs
Exception in thread "main" org.apache.spark.SparkException: Application application_1494307194668_0014 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/05/16 14:37:13 INFO util.ShutdownHookManager: Shutdown hook called
17/05/16 14:37:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d231fd6a-4752-4dc1-b041-080756d9c5aa
+ exit 0

 

================yarn logs -applicationId application_1494307194668_0014=========================

 

 

17/05/16 14:39:14 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.25.5:8032


Container: container_1494307194668_0014_02_000001 on slave2_8041
==================================================================
LogType:stderr
Log Upload Time:Tue May 16 14:37:14 +0800 2017
LogLength:1742
Log Contents:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/yarn/nm/usercache/hdfs/filecache/43/__spark_libs__8927180205054833354.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for TERM
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for HUP
17/05/16 14:37:11 INFO SignalUtils: Registered signal handler for INT
17/05/16 14:37:12 INFO ApplicationMaster: Preparing Local resources
17/05/16 14:37:12 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1494307194668_0014_000002
17/05/16 14:37:12 INFO SecurityManager: Changing view acls to: yarn,hdfs
17/05/16 14:37:12 INFO SecurityManager: Changing modify acls to: yarn,hdfs
17/05/16 14:37:12 INFO SecurityManager: Changing view acls groups to:
17/05/16 14:37:12 INFO SecurityManager: Changing modify acls groups to:
17/05/16 14:37:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
17/05/16 14:37:12 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/16 14:37:12 INFO ApplicationMaster: Waiting for spark context initialization...

LogType:stdout
Log Upload Time:Tue May 16 14:37:14 +0800 2017
LogLength:0
Log Contents:

 

Container: container_1494307194668_0014_01_000001 on slave3_8041
==================================================================
LogType:stderr
Log Upload Time:Tue May 16 14:37:13 +0800 2017
LogLength:1742
Log Contents:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/yarn/nm/usercache/hdfs/filecache/44/__spark_libs__8927180205054833354.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for TERM
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for HUP
17/05/16 14:36:58 INFO SignalUtils: Registered signal handler for INT
17/05/16 14:37:06 INFO ApplicationMaster: Preparing Local resources
17/05/16 14:37:07 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1494307194668_0014_000001
17/05/16 14:37:07 INFO SecurityManager: Changing view acls to: yarn,hdfs
17/05/16 14:37:07 INFO SecurityManager: Changing modify acls to: yarn,hdfs
17/05/16 14:37:07 INFO SecurityManager: Changing view acls groups to:
17/05/16 14:37:07 INFO SecurityManager: Changing modify acls groups to:
17/05/16 14:37:07 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
17/05/16 14:37:07 INFO ApplicationMaster: Starting the user application in a separate Thread
17/05/16 14:37:07 INFO ApplicationMaster: Waiting for spark context initialization...

LogType:stdout
Log Upload Time:Tue May 16 14:37:13 +0800 2017
LogLength:0
Log Contents:

 

Any help plz.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

This is some error caused by your app, rather than a Spark issue. You need to find the executor logs from the app and see what happened.

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

This is some error caused by your app, rather than a Spark issue. You need to find the executor logs from the app and see what happened.

avatar
New Contributor

You're right, the reason is that I didn't initialize a SparkContext until receiving a message from kafka.