12-11-2014 01:26 PM
We are trying to submit a Spark application from a Tomcat application running our business logic. The Tomcat app lives in a seperate non-hadoop cluster. We first were doing this by using the spark-yarn package to directly call Client#runApp() but found that the API we were using in Spark is being made private in future releases.
Now our solution is to make a very simply YARN application which execustes as its command "spark-submit --master yarn-cluster s3n://application/jar.jar ...". This seemed so simple and elegant, but it has some weird issues. We get "NoClassDefFoundErrors". When we ssh to the box, run the same spark-submit command it works, but doing this through YARN leads in the NoClassDefFoundErrors mentioned.
Also, examining the environment and Java properties between the working and broken, we find that they have a different java classpath. So weird...
Has anyone had this problem or know a solution? We would be happy to post our very simple code for creating the YARN application.