Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

job fails when submitted via yarn scheduler

job fails when submitted via yarn scheduler

Master Collaborator

this python script runs fine when submitted without yarn scheduler (spark-submit Friends-By-Age.py) but when submitted via Yarn scheduler its failing with error shown. even though it says in beginning its copying the py4j file to the right place but then scheduler can't find it .

I also noted that there are no directories under "/user/hive/.sparkStaging/" , I am not sure if that's the issue or the directory is deleted after the job completes.

also the log file for the job does not exist . if I check from the given URL for logs I get this

11133-capture.jpg

[admin@hadoop1 Spark-Python]$ env | grep PYTHON
PYTHONPATH=/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip:/usr/hdp/2.5.0.0-1245/spark2/python
[admin@hadoop1 Spark-Python]$ ls /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip
/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip
[admin@hadoop1 Spark-Python]$ spark-submit --master yarn --deploy-mode cluster --queue adhoc  Friends-By-Age.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
17/01/04 11:04:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/04 11:04:50 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/01/04 11:04:50 INFO RMProxy: Connecting to ResourceManager at hadoop2.example.com/10.100.44.16:8050
17/01/04 11:04:51 INFO Client: Requesting a new application from cluster with 5 NodeManagers
17/01/04 11:04:51 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7936 MB per container)
17/01/04 11:04:51 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/01/04 11:04:51 INFO Client: Setting up container launch context for our AM
17/01/04 11:04:51 INFO Client: Setting up the launch environment for our AM container
17/01/04 11:04:51 INFO Client: Preparing resources for our AM container
17/01/04 11:04:51 INFO YarnSparkHadoopUtil: getting token for namenode: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001
17/01/04 11:04:51 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 74 for hive on 10.100.44.17:8020
17/01/04 11:04:55 INFO metastore: Trying to connect to metastore with URI thrift://hadoop2.example.com:9083
17/01/04 11:04:55 INFO metastore: Connected to metastore.
17/01/04 11:04:55 INFO YarnSparkHadoopUtil: HBase class not found java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
17/01/04 11:04:55 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.5.0.0-1245/spark2/spark2-hdp-yarn-archive.tar.gz
17/01/04 11:04:55 INFO Client: Source and destination file systems are the same. Not copying hdfs:/hdp/apps/2.5.0.0-1245/spark2/spark2-hdp-yarn-archive.tar.gz
17/01/04 11:04:55 INFO Client: Uploading resource file:/home/admin/Spark-Python/Friends-By-Age.py -> hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/Friends-By-Age.py
17/01/04 11:04:55 INFO Client: Uploading resource file:/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip -> hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/pyspark.zip
17/01/04 11:04:55 INFO Client: Uploading resource file:/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip -> hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/py4j-0.10.1-src.zip
17/01/04 11:04:56 INFO Client: Uploading resource file:/tmp/spark-6a78d3da-090b-4c4d-84ad-730cd6bdf0a8/__spark_conf__1129595364919104787.zip -> hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/__spark_conf__.zip
17/01/04 11:04:56 INFO SecurityManager: Changing view acls to: admin,hive
17/01/04 11:04:56 INFO SecurityManager: Changing modify acls to: admin,hive
17/01/04 11:04:56 INFO SecurityManager: Changing view acls groups to:
17/01/04 11:04:56 INFO SecurityManager: Changing modify acls groups to:
17/01/04 11:04:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(admin, hive); groups with view permissions: Set(); users  with modify permissions: Set(admin, hive); groups with modify permissions: Set()
17/01/04 11:04:56 INFO Client: Submitting application application_1483479696331_0001 to ResourceManager
17/01/04 11:04:57 INFO YarnClientImpl: Submitted application application_1483479696331_0001
17/01/04 11:04:58 INFO Client: Application report for application_1483479696331_0001 (state: ACCEPTED)
17/01/04 11:04:58 INFO Client:
         client token: N/A
         diagnostics: [Wed Jan 04 11:04:58 -0500 2017] Scheduler has assigned a container for AM, waiting for AM container to be launched
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: adhoc
         start time: 1483545897646
         final status: UNDEFINED
         tracking URL: http://hadoop2.example.com:8088/proxy/application_1483479696331_0001/
         user: hive
17/01/04 11:05:26 INFO Client: Application report for application_1483479696331_0001 (state: ACCEPTED)
17/01/04 11:05:27 INFO Client: Application report for application_1483479696331_0001 (state: ACCEPTED)
17/01/04 11:05:28 INFO Client: Application report for application_1483479696331_0001 (state: FAILED)
17/01/04 11:05:28 INFO Client:
         client token: N/A
         diagnostics: Application application_1483479696331_0001 failed 2 times due to AM Container for appattempt_1483479696331_0001_000002 exited with  exitCode: -1000
For more detailed output, check the application tracking page: http://hadoop2.example.com:8088/cluster/app/application_1483479696331_0001 Then click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/py4j-0.10.1-src.zip
java.io.FileNotFoundException: File does not exist: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/py4j-0.10.1-src.zip

6 REPLIES 6

Re: job fails when submitted via yarn scheduler

Master Collaborator

any gurus listening ?

Highlighted

Re: job fails when submitted via yarn scheduler

Expert Contributor

Your spark job is failing with the below exceptions :

17/01/04 11:05:28 INFO Client:         client token: N/A         diagnostics: Application application_1483479696331_0001 failed 2 times due to AM Container for appattempt_1483479696331_0001_000002 exited with  exitCode: -1000For more detailed output, check the application tracking page: http://hadoop2.example.com:8088/cluster/app/application_1483479696331_0001 Then click on links to logs of each attempt.Diagnostics: File does not exist: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/py4j-0.10.1-src.zipjava.io.FileNotFoundException: File does not exist: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/py4j-0.10.1-src.zip

And looking at the earlier stack , it tries to upload the file over to HDFS as shown in the stack below:

17/01/04 11:04:55 INFO Client: Uploading resource file:/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip -> hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0001/py4j-0.10.1-src.zip

As you are running the spark job as "admin" user , can you check if you have access / permissions to write into the hdfs path? Can you try the same job as a "hive" user and check ?

Re: job fails when submitted via yarn scheduler

Master Collaborator

tried as hive user , same error.

         diagnostics: Application application_1483479696331_0014 failed 2 times due to AM Container for appattempt_1483479696331_0014_000002 exited with  exitCode: -1000
For more detailed output, check the application tracking page: http://hadoop2.example.com:8088/cluster/app/application_1483479696331_0014 Then click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0014/py4j-0.10.1-src.zip
java.io.FileNotFoundException: File does not exist: hdfs://hadoop1.example.com:8020/user/hive/.sparkStaging/application_1483479696331_0014/py4j-0.10.1-src.zip

Re: job fails when submitted via yarn scheduler

Master Collaborator

does this file have to be present on all nodes in the cluster ?

/usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip

also as hive user I can access the folders in hdfs

[hive@hadoop1 ~]$ hdfs dfs -copyFromLocal b.sql /user/hive/.sparkStaging/
[hive@hadoop1 ~]$ hdfs dfs -ls /user/hive/.sparkStaging/
Found 1 items
-rw-------   3 hive hdfs         13 2017-01-05 09:38 /user/hive/.sparkStaging/b.sql
[hive@hadoop1 ~]$

Re: job fails when submitted via yarn scheduler

Master Collaborator

anyone please ?

Re: job fails when submitted via yarn scheduler

Super Guru

@Sandeep Nemuri - Can you please check and help

Don't have an account?
Coming from Hortonworks? Activate your account here