Created 10-04-2017 10:25 PM
I was getting a zero-length error on /usr/hdp/apps/spark2/spark2-hdp-yarn-archive.tar.gz, which is documented as an issue after some upgrades. So I created and uploaded the file to hdfs using the following commands:
tar -zcvf spark2-hdp-yarn-archive.tar.gz /usr/hdp/current/spark2-client/jars/* hadoop fs -put spark2-hdp-yarn-archive.tar.gz /hdp/apps/2.5.3.0-37/spark2/
Now when running any spark job in yarn (say the example pi app), I get the following error:
Error: 'Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster'
Other info:
HERE'S THE WEIRD PART - If I completely remove the spark2-hdp-yarn-archive.tar.gz from HDFS then Spark jobs start to run again!
So, here are the questions:
Thanks in advance!
Created 10-06-2017 07:57 AM
That file is needed only for performance reason. It works like a cache. Otherwise, you have to upload the jars everytime an application starts.
Your problem might be that you have a root folder in your tar.gz. In this case, if you list your files in the archive, you should see something like
./one.jar ./another.jar ...
Instead, you should have no root folder, and listing the files should be:
one.jar another.jar ...
If this is the case, here you have some examples how to do it: https://stackoverflow.com/questions/939982/how-do-i-tar-a-directory-of-files-and-folders-without-inc....
Hope this helps.
Created 10-06-2017 07:57 AM
That file is needed only for performance reason. It works like a cache. Otherwise, you have to upload the jars everytime an application starts.
Your problem might be that you have a root folder in your tar.gz. In this case, if you list your files in the archive, you should see something like
./one.jar ./another.jar ...
Instead, you should have no root folder, and listing the files should be:
one.jar another.jar ...
If this is the case, here you have some examples how to do it: https://stackoverflow.com/questions/939982/how-do-i-tar-a-directory-of-files-and-folders-without-inc....
Hope this helps.
Created 10-07-2017 05:52 PM
Thanks! Very subtle difference, but obviously important to Spark! For everyone's reference, this tar command can be used to create a tar.gz with the jars in the root of the archive:
cd /usr/hdp/current/spark2-client/jars/ tar -zcvf /tmp/spark2-hdp-yarn-archive.tar.gz * # List the files in the archive. Note that they are in the root! tar -tvf /tmp/spark2-hdp-yarn-archive.tar.gz -rw-r--r-- root/root 69409 2016-11-30 03:31 activation-1.1.1.jar -rw-r--r-- root/root 445288 2016-11-30 03:31 antlr-2.7.7.jar -rw-r--r-- root/root 302248 2016-11-30 03:31 antlr4-runtime-4.5.3.jar -rw-r--r-- root/root 164368 2016-11-30 03:31 antlr-runtime-3.4.jar ... # Then upload to hdfs, fix ownership and permissions if needed, and good to go!