Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

FileNotFound Exception - launching a spark job using Yarn

avatar
New Contributor

Hi,

I would like to understand the behavior of SparkLauncherSparkShellProcess that uses Yarn.

Using Kylo (dataLake), when the SparkLauncherSparkShellProcess is launched, why does the RawLocalFileSystem use deprecatedGetFileStatus API? Where does this method look for the file and what permissions?

The spark submit is failing with the following exception. I verified the files are present in the staging directory in the local file system of the edge node and the user has read access to the files.

However org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus is throwing java.io.FileNotFoundException.


The spark master is set to yarn and deployMode is cluster and connects to the EMR cluster from the edge node.

Spark Version: 2.3.1


StackTrace:

2019-03-22 22:04:38 DEBUG LauncherServer-3:SparkLauncherSparkShellProcess:270 - Spark Shell client [d6242128-5e08-46d0-9f44-40b787f9d4c4] state changed: CONNECTED

2019-03-22 22:04:38 INFO main:Client:54 - Requesting a new application from cluster with 2 NodeManagers

2019-03-22 22:04:38 INFO main:Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (11712 MB per container)

2019-03-22 22:04:38 INFO main:Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead

2019-03-22 22:04:38 INFO main:Client:54 - Setting up container launch context for our AM

2019-03-22 22:04:38 INFO main:Client:54 - Setting up the launch environment for our AM container

2019-03-22 22:04:38 INFO main:Client:54 - Preparing resources for our AM container

2019-03-22 22:04:38 DEBUG main:HadoopDelegationTokenManager:58 - Service hadoopfs does not require a token. Check your configuration to see if security is disabled or not.

2019-03-22 22:04:38 DEBUG main:HadoopDelegationTokenManager:58 - Service hive does not require a token. Check your configuration to see if security is disabled or not.

2019-03-22 22:04:39 DEBUG main:HadoopDelegationTokenManager:58 - Service hbase does not require a token. Check your configuration to see if security is disabled or not.

2019-03-22 22:04:39 DEBUG main:Client:58 -

2019-03-22 22:04:39 WARN main:Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/tmp/spark-2929fcc8-24df-4e6d-948c-96153dd56ec3/__spark_libs__7437371065422767124.zip -> /spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/usr/lib/spark/jars/kylo-spark-shell-client-v2-0.9.1.3.jar -> /spark_stage/.sparkStaging/application_1553289922590_0003/kylo-spark-shell-client-v2-0.9.1.3.jar

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/usr/lib/spark/jars/hbase-common-1.4.4.jar -> /spark_stage/.sparkStaging/application_1553289922590_0003/hbase-common-1.4.4.jar

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/opt/kylo/kylo-services/conf/log4j.properties -> /spark_stage/.sparkStaging/application_1553289922590_0003/log4j.properties

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/opt/kylo/kylo-services/conf/spark.properties -> /spark_stage/.sparkStaging/application_1553289922590_0003/spark.properties

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/etc/hadoop/conf/hbase-site.xml -> /spark_stage/.sparkStaging/application_1553289922590_0003/hbase-site.xml

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/opt/kylo/ssl/kylo-ui.jks -> /spark_stage/.sparkStaging/application_1553289922590_0003/kylo-ui.jks

2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/tmp/spark-2929fcc8-24df-4e6d-948c-96153dd56ec3/__spark_conf__2249993461795431943.zip -> /spark_stage/.sparkStaging/application_1553289922590_0003/__spark_conf__.zip

2019-03-22 22:04:43 DEBUG main:Client:58 - ===============================================================================

2019-03-22 22:04:43 DEBUG main:Client:58 - YARN AM launch context:

2019-03-22 22:04:43 DEBUG main:Client:58 - user class: com.thinkbiganalytics.spark.SparkShellApp

2019-03-22 22:04:43 DEBUG main:Client:58 - env:

2019-03-22 22:04:43 DEBUG main:Client:58 - CLASSPATH -> /opt/kylo/kylo-services/conf:/opt/kylo/kylo-services/lib/mariadb-java-client-1.5.7.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__

2019-03-22 22:04:43 DEBUG main:Client:58 - HADOOP_CONF_DIR -> /etc/hadoop

2019-03-22 22:04:43 DEBUG main:Client:58 - SPARK_YARN_STAGING_DIR -> /spark_stage/.sparkStaging/application_1553289922590_0003

2019-03-22 22:04:43 DEBUG main:Client:58 - SPARK_USER -> kylo

2019-03-22 22:04:43 DEBUG main:Client:58 - KYLO_CLIENT_ID -> d6242128-5e08-46d0-9f44-40b787f9d4c2

2019-03-22 22:04:43 DEBUG main:Client:58 - PYTHONHASHSEED -> 0

2019-03-22 22:04:43 DEBUG main:Client:58 - KYLO_CLIENT_SECRET -> *********(redacted)

2019-03-22 22:04:43 DEBUG main:Client:58 - YARN_CONF_DIR -> /etc/hadoop

2019-03-22 22:04:43 DEBUG main:Client:58 - resources:

2019-03-22 22:04:43 DEBUG main:Client:58 - log4j.properties -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/log4j.properties" } size: 2068 timestamp: 1553292283000 type: FILE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - kylo-ui.jks -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/kylo-ui.jks" } size: 2091 timestamp: 1553292283000 type: FILE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - __app__.jar -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/kylo-spark-shell-client-v2-0.9.1.3.jar" } size: 28054378 timestamp: 1553292283000 type: FILE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - spark.properties -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/spark.properties" } size: 4127 timestamp: 1553292283000 type: FILE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - __spark_libs__ -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip" } size: 256569291 timestamp: 1553292283000 type: ARCHIVE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - __spark_conf__ -> resource { port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_conf__.zip" } size: 102161 timestamp: 1553292283000 type: ARCHIVE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - hbase-common-1.4.4.jar -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/hbase-common-1.4.4.jar" } size: 621182 timestamp: 1553292283000 type: FILE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - hbase-site.xml -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/hbase-site.xml" } size: 1529 timestamp: 1553292283000 type: FILE visibility: PRIVATE

2019-03-22 22:04:43 DEBUG main:Client:58 - command:

2019-03-22 22:04:43 DEBUG main:Client:58 - {{JAVA_HOME}}/bin/java -server -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp '-Dlog4j.configuration=log4j.properties' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.thinkbiganalytics.spark.SparkShellApp' --jar file:/usr/lib/spark/jars/kylo-spark-shell-client-v2-0.9.1.3.jar --arg '--idle-timeout' --arg '900' --arg '--port-max' --arg '45999' --arg '--port-min' --arg '45000' --arg '--server-url' --arg 'https://host:port/proxy/v1/spark/shell/register' --arg '--server-keystore-path' --arg 'xx.jks' --arg '--server-keystore-password' --arg 'yy' --properties-file {{PWD}}/__spark_conf__/__spark_conf__.properties 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr

2019-03-22 22:04:43 DEBUG main:Client:58 - ===============================================================================

2019-03-22 22:04:43 DEBUG main:Client:58 - spark.yarn.maxAppAttempts is not set. Cluster's default value will be used.

2019-03-22 22:04:43 INFO main:Client:54 - Submitting application application_1553289922590_0003 to ResourceManager

2019-03-22 22:04:43 DEBUG LauncherServer-3:SparkLauncherSparkShellProcess:208 - Spark Shell client [d6242128-5e08-46d0-9f44-40b787f9d4c4] application id: application_1553289922590_0003

2019-03-22 22:04:43 DEBUG LauncherServer-3:SparkLauncherSparkShellProcess:270 - Spark Shell client [d6242128-5e08-46d0-9f44-40b787f9d4c4] state changed: SUBMITTED

2019-03-22 22:04:44 INFO main:Client:54 - Application report for application_1553289922590_0003 (state: ACCEPTED)

2019-03-22 22:04:44 DEBUG main:Client:58 -

client token: N/A

diagnostics: [Fri Mar 22 22:04:44 +0000 2019] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:23424, vCores:16> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ;

ApplicationMaster host: N/A

ApplicationMaster RPC port: -1

queue: default

start time: 1553292283963

final status: UNDEFINED

tracking URL: http://fqdn:20888/proxy/application_1553289922590_0003/

user: kylo

2019-03-22 22:04:45 INFO main:Client:54 - Application report for application_1553289922590_0003 (state: FAILED)

2019-03-22 22:04:45 DEBUG main:Client:58 -

client token: N/A

diagnostics: Application application_1553289922590_0003 failed 2 times due to AM Container for appattempt_1553289922590_0003_000002 exited with exitCode: -1000

Failing this attempt.Diagnostics: File file:/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip does not exist

java.io.FileNotFoundException: File file:/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip does not exist

at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630)

at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:452)

at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)

at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)


1 REPLY 1

avatar
New Contributor

Can someone help me understand where does the following method look for the file in the yarn cluster mode connecting to EMR?

org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus


As per the code, the method is throwing

java.io.FileNotFoundException: File file:/spark_stage/.sparkStaging/application_*/__spark_libs__*.zip does not exist


The file is present in the local file system and accessible by the user launching the Yarn AM.

This is happening after the application is submitted to ResourceManager and accepted.

Are there any other property that needs to be set other than the stagingDir? I am not sure if its a path issue (file:/spark_stage) or it is trying to look for the file in the HDFS!

spark.yarn.stagingDir=file:///spark_stage

As per the stack trace, the org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus is called when org.apache.hadoop.yarn.util.FSDownload.call is invoked.


at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630)

at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:452)

at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)

at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)