I was getting a zero-length error on /usr/hdp/apps/spark2/spark2-hdp-yarn-archive.tar.gz, which is documented as an issue
after some upgrades. So I created and uploaded the file to hdfs using the following commands:
tar -zcvf spark2-hdp-yarn-archive.tar.gz /usr/hdp/current/spark2-client/jars/*
hadoop fs -put spark2-hdp-yarn-archive.tar.gz /hdp/apps/220.127.116.11-37/spark2/
Now when running any spark job in yarn (say the example pi app), I get the following error:
Error: 'Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster'
- This is HDP 2.5.3 Running Spark 2.1
Upgraded from HDP 2.2.8 -> 2.4.3 -> 2.5.3
- I believe the missing class is in spark/lib/spark-hdp-assembly.jar, but this does not exist.
HERE'S THE WEIRD PART - If I completely remove the spark2-hdp-yarn-archive.tar.gz from HDFS then Spark jobs start to run again!
So, here are the questions:
- Is this file (spark2-hdp-yarn-archive.tar.gz) needed?
- If so, any direction on correcting this error.
Thanks in advance!