Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Help Spark Master for Sandbox VM?

Explorer

I have downloaded sandbox VM and it is working fine, now i need to create SC manually, like with below code, but how would i know Spark Master name here instead of local.

val conf = new SparkConf().setMaster("local").setAppName("My App")

1 ACCEPTED SOLUTION

Hi @Rajendra Vechalapu, you can omit setting master in your source, see this example:

val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)

You can then launch your application using spark-submit and provide the master there there using "--master" and "--deploy-mode" options. Refer to Spark programming guide for this and other useful hints.

Edit: When you run spark-submit on Sandbox, be sure to supply additional arguments for master, num-executors, driver-memory, executor-memory, and executor-cores as given below. Note that larger values for last 4 arguments will not work on the Sandbox! Follow (and you can also try) this example computing Pi in Python (as any user who has access to HDFS/Yarn):

cd /usr/hdp/current/spark-client/ 
spark-submit --master yarnclient --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/src/main/python/pi.py 10

"--master yarncluster" works too. You can also set these 4 in spark-env in Ambari. They are already there but commented out, and not all with values like here. See also Spark guide on HDP.

View solution in original post

3 REPLIES 3

Hi @Rajendra Vechalapu, you can omit setting master in your source, see this example:

val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)

You can then launch your application using spark-submit and provide the master there there using "--master" and "--deploy-mode" options. Refer to Spark programming guide for this and other useful hints.

Edit: When you run spark-submit on Sandbox, be sure to supply additional arguments for master, num-executors, driver-memory, executor-memory, and executor-cores as given below. Note that larger values for last 4 arguments will not work on the Sandbox! Follow (and you can also try) this example computing Pi in Python (as any user who has access to HDFS/Yarn):

cd /usr/hdp/current/spark-client/ 
spark-submit --master yarnclient --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/src/main/python/pi.py 10

"--master yarncluster" works too. You can also set these 4 in spark-env in Ambari. They are already there but commented out, and not all with values like here. See also Spark guide on HDP.

Explorer

Thx , Working.

New Contributor

If I want to run from external of the VM, for example, in Eclipse, what's the master ip and port? Thanks.