Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark standalone - Run a Spark app using many workers

Highlighted

Spark standalone - Run a Spark app using many workers

Explorer

Hello,

 

I deployed a Spark Standalone.

 

Let's take this configuration: 

- "myComputer" is a Windows machine running  a Spark environment trought an IDE (IntellJ)

- "mySparkMaster" is a linux machine hosting  the Spark Master node

- "mySparkWorker" is one of the linux machines hosting one of the Spark Worker nodes

 

I build a new application from "myComputer" and package it as a JAR archive.

If I place the JAR on "mySparkMaster" machine and run it using SPARK-SUBMIT, it works well:

 

spark-submit -v --class scala.TestApp --master spark://mySparkServer:7077 local/jar/TestApp.jar

 

Problems:

- My application takes lots of time to finish and performances are always the same even if I shutdown most of available Spark Workers. It tend to demonstrate that my application is not fired on the Spark Cluster but only on the Spark instance of the mySparkMaster machine. Moreover, I can find no execution trace in mySparkWorker machine logs...

- I cannot see execution history from Spark web UI when application is submitted from mySparkMaster (but I can see execution history when I submit application from myComputer...)

 

Question: how can I be sure to submit my application on Spark cluster so that it is scheduled on all available Spark Workers?

 

If I place the JAR archive on "mySparkWorker" machine and run it using SPARK-SUBMIT with DEPLOY-MODE=CLUSTER, then it fails as it seems to post it on a REST web service:

 

spark-submit -v --class scala.TestApp --master spark://mySparkServer:7077 --deploy-mode cluster --supervise local/jar/TestApp.jar
Running Spark using the REST application submission protocol.
Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/home/myUser/local/jar/TestApp.jar
scala.TestApp
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> true
spark.app.name -> scala.TestApp
spark.jars -> file:/home/myUser/local/jar/TestApp.jar
spark.master -> spark://mySparkMaster:7077
Classpath elements:



Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/09/03 13:37:04 INFO RestSubmissionClient: Submitting a request to launch an application in spark://mySparkMaster:7077.
15/09/03 13:37:04 WARN RestSubmissionClient: Unable to connect to server spark://mySparkMaster:7077.
Warning: Master endpoint spark://mySparkMaster:7077 was not a REST server. Falling back to legacy submission gateway instead.
Main class:
org.apache.spark.deploy.Client
Arguments:
--supervise
launch
spark://mySparkSMaster:7077
file:/home/myUser/local/jar/TestApp.jar
scala.JoinPerfs
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> true
spark.app.name -> scala.JoinPerfs
spark.jars -> file:/home/myUser/local/jar/TestApp.jar
spark.master -> spark://mySparkMaster:7077
Classpath elements:



15/09/03 13:37:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Driver successfully submitted as driver-20150903143706-0004
... waiting before polling master for driver state
... polling master for driver state
State of driver-20150903143706-0004 is ERROR
Exception from cluster was: java.io.IOException: Failed to create directory /Appvg/spark-1.4.1/work/driver-20150903143706-0004
java.io.IOException: Failed to create directory /Appvg/spark-1.4.1/work/driver-20150903143706-0004
        at org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$createWorkingDirectory(DriverRunner.scala:130)
        at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:78)

 

Thanks for your help :)