Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Spark2 - Getting 'Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster' Error when valid spark2-hdp-yarn-archive.tar.gz is present

avatar
Expert Contributor

I was getting a zero-length error on /usr/hdp/apps/spark2/spark2-hdp-yarn-archive.tar.gz, which is documented as an issue after some upgrades. So I created and uploaded the file to hdfs using the following commands:

tar -zcvf spark2-hdp-yarn-archive.tar.gz /usr/hdp/current/spark2-client/jars/* 
hadoop fs -put spark2-hdp-yarn-archive.tar.gz /hdp/apps/2.5.3.0-37/spark2/

Now when running any spark job in yarn (say the example pi app), I get the following error:

Error: 'Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster' 

Other info:

  • This is HDP 2.5.3 Running Spark 2.1 Upgraded from HDP 2.2.8 -> 2.4.3 -> 2.5.3
  • I believe the missing class is in spark/lib/spark-hdp-assembly.jar, but this does not exist.

HERE'S THE WEIRD PART - If I completely remove the spark2-hdp-yarn-archive.tar.gz from HDFS then Spark jobs start to run again!

So, here are the questions:

  • Is this file (spark2-hdp-yarn-archive.tar.gz) needed?
  • If so, any direction on correcting this error.

Thanks in advance!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

That file is needed only for performance reason. It works like a cache. Otherwise, you have to upload the jars everytime an application starts.

Your problem might be that you have a root folder in your tar.gz. In this case, if you list your files in the archive, you should see something like

./one.jar
./another.jar
...

Instead, you should have no root folder, and listing the files should be:

one.jar
another.jar
...

If this is the case, here you have some examples how to do it: https://stackoverflow.com/questions/939982/how-do-i-tar-a-directory-of-files-and-folders-without-inc....

Hope this helps.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

That file is needed only for performance reason. It works like a cache. Otherwise, you have to upload the jars everytime an application starts.

Your problem might be that you have a root folder in your tar.gz. In this case, if you list your files in the archive, you should see something like

./one.jar
./another.jar
...

Instead, you should have no root folder, and listing the files should be:

one.jar
another.jar
...

If this is the case, here you have some examples how to do it: https://stackoverflow.com/questions/939982/how-do-i-tar-a-directory-of-files-and-folders-without-inc....

Hope this helps.

avatar
Expert Contributor

Thanks! Very subtle difference, but obviously important to Spark! For everyone's reference, this tar command can be used to create a tar.gz with the jars in the root of the archive:

cd /usr/hdp/current/spark2-client/jars/
tar -zcvf /tmp/spark2-hdp-yarn-archive.tar.gz *

# List the files in the archive. Note that they are in the root!
tar -tvf /tmp/spark2-hdp-yarn-archive.tar.gz 
-rw-r--r-- root/root     69409 2016-11-30 03:31 activation-1.1.1.jar
-rw-r--r-- root/root    445288 2016-11-30 03:31 antlr-2.7.7.jar
-rw-r--r-- root/root    302248 2016-11-30 03:31 antlr4-runtime-4.5.3.jar
-rw-r--r-- root/root    164368 2016-11-30 03:31 antlr-runtime-3.4.jar
...
# Then upload to hdfs, fix ownership and permissions if needed, and good to go!