Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark2 - Getting 'Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster' Error when valid spark2-hdp-yarn-archive.tar.gz is present

avatar
Expert Contributor

I was getting a zero-length error on /usr/hdp/apps/spark2/spark2-hdp-yarn-archive.tar.gz, which is documented as an issue after some upgrades. So I created and uploaded the file to hdfs using the following commands:

tar -zcvf spark2-hdp-yarn-archive.tar.gz /usr/hdp/current/spark2-client/jars/* 
hadoop fs -put spark2-hdp-yarn-archive.tar.gz /hdp/apps/2.5.3.0-37/spark2/

Now when running any spark job in yarn (say the example pi app), I get the following error:

Error: 'Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster' 

Other info:

  • This is HDP 2.5.3 Running Spark 2.1 Upgraded from HDP 2.2.8 -> 2.4.3 -> 2.5.3
  • I believe the missing class is in spark/lib/spark-hdp-assembly.jar, but this does not exist.

HERE'S THE WEIRD PART - If I completely remove the spark2-hdp-yarn-archive.tar.gz from HDFS then Spark jobs start to run again!

So, here are the questions:

  • Is this file (spark2-hdp-yarn-archive.tar.gz) needed?
  • If so, any direction on correcting this error.

Thanks in advance!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

That file is needed only for performance reason. It works like a cache. Otherwise, you have to upload the jars everytime an application starts.

Your problem might be that you have a root folder in your tar.gz. In this case, if you list your files in the archive, you should see something like

./one.jar
./another.jar
...

Instead, you should have no root folder, and listing the files should be:

one.jar
another.jar
...

If this is the case, here you have some examples how to do it: https://stackoverflow.com/questions/939982/how-do-i-tar-a-directory-of-files-and-folders-without-inc....

Hope this helps.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

That file is needed only for performance reason. It works like a cache. Otherwise, you have to upload the jars everytime an application starts.

Your problem might be that you have a root folder in your tar.gz. In this case, if you list your files in the archive, you should see something like

./one.jar
./another.jar
...

Instead, you should have no root folder, and listing the files should be:

one.jar
another.jar
...

If this is the case, here you have some examples how to do it: https://stackoverflow.com/questions/939982/how-do-i-tar-a-directory-of-files-and-folders-without-inc....

Hope this helps.

avatar
Expert Contributor

Thanks! Very subtle difference, but obviously important to Spark! For everyone's reference, this tar command can be used to create a tar.gz with the jars in the root of the archive:

cd /usr/hdp/current/spark2-client/jars/
tar -zcvf /tmp/spark2-hdp-yarn-archive.tar.gz *

# List the files in the archive. Note that they are in the root!
tar -tvf /tmp/spark2-hdp-yarn-archive.tar.gz 
-rw-r--r-- root/root     69409 2016-11-30 03:31 activation-1.1.1.jar
-rw-r--r-- root/root    445288 2016-11-30 03:31 antlr-2.7.7.jar
-rw-r--r-- root/root    302248 2016-11-30 03:31 antlr4-runtime-4.5.3.jar
-rw-r--r-- root/root    164368 2016-11-30 03:31 antlr-runtime-3.4.jar
...
# Then upload to hdfs, fix ownership and permissions if needed, and good to go!