I deployed a Spark Standalone.
Let's take this configuration:
- "myComputer" is a Windows machine running a Spark environment trought an IDE (IntellJ)
- "mySparkMaster" is a linux machine hosting the Spark Master node
- "mySparkWorker" is one of the linux machines hosting one of the Spark Worker nodes
I build a new application from "myComputer" and package it as a JAR archive.
If I place the JAR on "mySparkMaster" machine and run it using SPARK-SUBMIT, it works well:
spark-submit -v --class scala.TestApp --master spark://mySparkServer:7077 local/jar/TestApp.jar
- My application takes lots of time to finish and performances are always the same even if I shutdown most of available Spark Workers. It tend to demonstrate that my application is not fired on the Spark Cluster but only on the Spark instance of the mySparkMaster machine. Moreover, I can find no execution trace in mySparkWorker machine logs...
- I cannot see execution history from Spark web UI when application is submitted from mySparkMaster (but I can see execution history when I submit application from myComputer...)
Question: how can I be sure to submit my application on Spark cluster so that it is scheduled on all available Spark Workers?
If I place the JAR archive on "mySparkWorker" machine and run it using SPARK-SUBMIT with DEPLOY-MODE=CLUSTER, then it fails as it seems to post it on a REST web service:
spark-submit -v --class scala.TestApp --master spark://mySparkServer:7077 --deploy-mode cluster --supervise local/jar/TestApp.jar
Running Spark using the REST application submission protocol. Main class: org.apache.spark.deploy.rest.RestSubmissionClient Arguments: file:/home/myUser/local/jar/TestApp.jar scala.TestApp System properties: SPARK_SUBMIT -> true spark.driver.supervise -> true spark.app.name -> scala.TestApp spark.jars -> file:/home/myUser/local/jar/TestApp.jar spark.master -> spark://mySparkMaster:7077 Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/09/03 13:37:04 INFO RestSubmissionClient: Submitting a request to launch an application in spark://mySparkMaster:7077. 15/09/03 13:37:04 WARN RestSubmissionClient: Unable to connect to server spark://mySparkMaster:7077. Warning: Master endpoint spark://mySparkMaster:7077 was not a REST server. Falling back to legacy submission gateway instead. Main class: org.apache.spark.deploy.Client Arguments: --supervise launch spark://mySparkSMaster:7077 file:/home/myUser/local/jar/TestApp.jar scala.JoinPerfs System properties: SPARK_SUBMIT -> true spark.driver.supervise -> true spark.app.name -> scala.JoinPerfs spark.jars -> file:/home/myUser/local/jar/TestApp.jar spark.master -> spark://mySparkMaster:7077 Classpath elements: 15/09/03 13:37:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Driver successfully submitted as driver-20150903143706-0004 ... waiting before polling master for driver state ... polling master for driver state State of driver-20150903143706-0004 is ERROR Exception from cluster was: java.io.IOException: Failed to create directory /Appvg/spark-1.4.1/work/driver-20150903143706-0004 java.io.IOException: Failed to create directory /Appvg/spark-1.4.1/work/driver-20150903143706-0004 at org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$createWorkingDirectory(DriverRunner.scala:130) at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:78)
Thanks for your help :)