Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running spark-submit from a remote machine using a YARN application

Highlighted

Running spark-submit from a remote machine using a YARN application

New Contributor

We are trying to submit a Spark application from a Tomcat application running our business logic. The Tomcat app lives in a seperate non-hadoop cluster. We first were doing this by using the spark-yarn package to directly call Client#runApp() but found that the API we were using in Spark is being made private in future releases. 

 

Now our solution is to make a very simply YARN application which execustes as its command "spark-submit --master yarn-cluster s3n://application/jar.jar ...". This seemed so simple and elegant, but it has some weird issues. We get "NoClassDefFoundErrors". When we ssh to the box, run the same spark-submit command it works, but doing this through YARN leads in the NoClassDefFoundErrors mentioned.

 

Also, examining the environment and Java properties between the working and broken, we find that they have a different java classpath. So weird...

 

Has anyone had this problem or know a solution? We would be happy to post our very simple code for creating the YARN application.

 

Thanks!