- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
FileNotFound Exception - launching a spark job using Yarn
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
Created ‎03-23-2019 01:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I would like to understand the behavior of SparkLauncherSparkShellProcess that uses Yarn.
Using Kylo (dataLake), when the SparkLauncherSparkShellProcess is launched, why does the RawLocalFileSystem use deprecatedGetFileStatus API? Where does this method look for the file and what permissions?
The spark submit is failing with the following exception. I verified the files are present in the staging directory in the local file system of the edge node and the user has read access to the files.
However org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus is throwing java.io.FileNotFoundException.
The spark master is set to yarn and deployMode is cluster and connects to the EMR cluster from the edge node.
Spark Version: 2.3.1
StackTrace:
2019-03-22 22:04:38 DEBUG LauncherServer-3:SparkLauncherSparkShellProcess:270 - Spark Shell client [d6242128-5e08-46d0-9f44-40b787f9d4c4] state changed: CONNECTED
2019-03-22 22:04:38 INFO main:Client:54 - Requesting a new application from cluster with 2 NodeManagers
2019-03-22 22:04:38 INFO main:Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (11712 MB per container)
2019-03-22 22:04:38 INFO main:Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2019-03-22 22:04:38 INFO main:Client:54 - Setting up container launch context for our AM
2019-03-22 22:04:38 INFO main:Client:54 - Setting up the launch environment for our AM container
2019-03-22 22:04:38 INFO main:Client:54 - Preparing resources for our AM container
2019-03-22 22:04:38 DEBUG main:HadoopDelegationTokenManager:58 - Service hadoopfs does not require a token. Check your configuration to see if security is disabled or not.
2019-03-22 22:04:38 DEBUG main:HadoopDelegationTokenManager:58 - Service hive does not require a token. Check your configuration to see if security is disabled or not.
2019-03-22 22:04:39 DEBUG main:HadoopDelegationTokenManager:58 - Service hbase does not require a token. Check your configuration to see if security is disabled or not.
2019-03-22 22:04:39 DEBUG main:Client:58 -
2019-03-22 22:04:39 WARN main:Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/tmp/spark-2929fcc8-24df-4e6d-948c-96153dd56ec3/__spark_libs__7437371065422767124.zip -> /spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/usr/lib/spark/jars/kylo-spark-shell-client-v2-0.9.1.3.jar -> /spark_stage/.sparkStaging/application_1553289922590_0003/kylo-spark-shell-client-v2-0.9.1.3.jar
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/usr/lib/spark/jars/hbase-common-1.4.4.jar -> /spark_stage/.sparkStaging/application_1553289922590_0003/hbase-common-1.4.4.jar
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/opt/kylo/kylo-services/conf/log4j.properties -> /spark_stage/.sparkStaging/application_1553289922590_0003/log4j.properties
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/opt/kylo/kylo-services/conf/spark.properties -> /spark_stage/.sparkStaging/application_1553289922590_0003/spark.properties
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/etc/hadoop/conf/hbase-site.xml -> /spark_stage/.sparkStaging/application_1553289922590_0003/hbase-site.xml
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/opt/kylo/ssl/kylo-ui.jks -> /spark_stage/.sparkStaging/application_1553289922590_0003/kylo-ui.jks
2019-03-22 22:04:43 INFO main:Client:54 - Uploading resource file:/tmp/spark-2929fcc8-24df-4e6d-948c-96153dd56ec3/__spark_conf__2249993461795431943.zip -> /spark_stage/.sparkStaging/application_1553289922590_0003/__spark_conf__.zip
2019-03-22 22:04:43 DEBUG main:Client:58 - ===============================================================================
2019-03-22 22:04:43 DEBUG main:Client:58 - YARN AM launch context:
2019-03-22 22:04:43 DEBUG main:Client:58 - user class: com.thinkbiganalytics.spark.SparkShellApp
2019-03-22 22:04:43 DEBUG main:Client:58 - env:
2019-03-22 22:04:43 DEBUG main:Client:58 - CLASSPATH -> /opt/kylo/kylo-services/conf:/opt/kylo/kylo-services/lib/mariadb-java-client-1.5.7.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
2019-03-22 22:04:43 DEBUG main:Client:58 - HADOOP_CONF_DIR -> /etc/hadoop
2019-03-22 22:04:43 DEBUG main:Client:58 - SPARK_YARN_STAGING_DIR -> /spark_stage/.sparkStaging/application_1553289922590_0003
2019-03-22 22:04:43 DEBUG main:Client:58 - SPARK_USER -> kylo
2019-03-22 22:04:43 DEBUG main:Client:58 - KYLO_CLIENT_ID -> d6242128-5e08-46d0-9f44-40b787f9d4c2
2019-03-22 22:04:43 DEBUG main:Client:58 - PYTHONHASHSEED -> 0
2019-03-22 22:04:43 DEBUG main:Client:58 - KYLO_CLIENT_SECRET -> *********(redacted)
2019-03-22 22:04:43 DEBUG main:Client:58 - YARN_CONF_DIR -> /etc/hadoop
2019-03-22 22:04:43 DEBUG main:Client:58 - resources:
2019-03-22 22:04:43 DEBUG main:Client:58 - log4j.properties -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/log4j.properties" } size: 2068 timestamp: 1553292283000 type: FILE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - kylo-ui.jks -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/kylo-ui.jks" } size: 2091 timestamp: 1553292283000 type: FILE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - __app__.jar -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/kylo-spark-shell-client-v2-0.9.1.3.jar" } size: 28054378 timestamp: 1553292283000 type: FILE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - spark.properties -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/spark.properties" } size: 4127 timestamp: 1553292283000 type: FILE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - __spark_libs__ -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip" } size: 256569291 timestamp: 1553292283000 type: ARCHIVE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - __spark_conf__ -> resource { port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_conf__.zip" } size: 102161 timestamp: 1553292283000 type: ARCHIVE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - hbase-common-1.4.4.jar -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/hbase-common-1.4.4.jar" } size: 621182 timestamp: 1553292283000 type: FILE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - hbase-site.xml -> resource { scheme: "file" port: -1 file: "/spark_stage/.sparkStaging/application_1553289922590_0003/hbase-site.xml" } size: 1529 timestamp: 1553292283000 type: FILE visibility: PRIVATE
2019-03-22 22:04:43 DEBUG main:Client:58 - command:
2019-03-22 22:04:43 DEBUG main:Client:58 - {{JAVA_HOME}}/bin/java -server -Xmx1024m -Djava.io.tmpdir={{PWD}}/tmp '-Dlog4j.configuration=log4j.properties' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.thinkbiganalytics.spark.SparkShellApp' --jar file:/usr/lib/spark/jars/kylo-spark-shell-client-v2-0.9.1.3.jar --arg '--idle-timeout' --arg '900' --arg '--port-max' --arg '45999' --arg '--port-min' --arg '45000' --arg '--server-url' --arg 'https://host:port/proxy/v1/spark/shell/register' --arg '--server-keystore-path' --arg 'xx.jks' --arg '--server-keystore-password' --arg 'yy' --properties-file {{PWD}}/__spark_conf__/__spark_conf__.properties 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
2019-03-22 22:04:43 DEBUG main:Client:58 - ===============================================================================
2019-03-22 22:04:43 DEBUG main:Client:58 - spark.yarn.maxAppAttempts is not set. Cluster's default value will be used.
2019-03-22 22:04:43 INFO main:Client:54 - Submitting application application_1553289922590_0003 to ResourceManager
2019-03-22 22:04:43 DEBUG LauncherServer-3:SparkLauncherSparkShellProcess:208 - Spark Shell client [d6242128-5e08-46d0-9f44-40b787f9d4c4] application id: application_1553289922590_0003
2019-03-22 22:04:43 DEBUG LauncherServer-3:SparkLauncherSparkShellProcess:270 - Spark Shell client [d6242128-5e08-46d0-9f44-40b787f9d4c4] state changed: SUBMITTED
2019-03-22 22:04:44 INFO main:Client:54 - Application report for application_1553289922590_0003 (state: ACCEPTED)
2019-03-22 22:04:44 DEBUG main:Client:58 -
client token: N/A
diagnostics: [Fri Mar 22 22:04:44 +0000 2019] Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:23424, vCores:16> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1553292283963
final status: UNDEFINED
tracking URL: http://fqdn:20888/proxy/application_1553289922590_0003/
user: kylo
2019-03-22 22:04:45 INFO main:Client:54 - Application report for application_1553289922590_0003 (state: FAILED)
2019-03-22 22:04:45 DEBUG main:Client:58 -
client token: N/A
diagnostics: Application application_1553289922590_0003 failed 2 times due to AM Container for appattempt_1553289922590_0003_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: File file:/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip does not exist
java.io.FileNotFoundException: File file:/spark_stage/.sparkStaging/application_1553289922590_0003/__spark_libs__7437371065422767124.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:452)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Created ‎03-23-2019 07:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can someone help me understand where does the following method look for the file in the yarn cluster mode connecting to EMR?
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus
As per the code, the method is throwing
java.io.FileNotFoundException: File file:/spark_stage/.sparkStaging/application_*/__spark_libs__*.zip does not exist
The file is present in the local file system and accessible by the user launching the Yarn AM.
This is happening after the application is submitted to ResourceManager and accepted.
Are there any other property that needs to be set other than the stagingDir? I am not sure if its a path issue (file:/spark_stage) or it is trying to look for the file in the HDFS!
spark.yarn.stagingDir=file:///spark_stage
As per the stack trace, the org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus is called when org.apache.hadoop.yarn.util.FSDownload.call is invoked.
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:452)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
