Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark submit executor options

Spark submit executor options

Master Collaborator

Hi I am running CDH 5.4.4 a trying to run a sample app in Spark as much as possible in parallel.

I encountered a problem with default settings, submitting the application with spark-submit initiated only two executors. However I wanted to run 7 of them.

 

Reading YARN logs I realized that Spark requested more memory, thus YARN provided just two containers (actually three) for execution.

 

I thought this can be adjusted via command line options, but I failed.

 

Setting options such as num-executors, executor-memory worked with spark-shell but spark-submit ignored this 

command line options.

 

I tried to edit the /opt/cloudera/parcels/CDH/etc/spark/conf.dist/spark-defaults.conf

adding

spark.executor.memory=600M

spar.executor.instances=7

 

but it dont worked, after editing the /etc/spark/conf.cloudera.spark_on_yarn/spark-defaults.conf the Spark configuration changed and the 

sample application worked as I wanted (7 executors).

 

But still I have a problem, because I want to run one application with 7 executors, another with 3 executors. Changing the config file is not an option for me

 

Any ideas why spark-submit ignores the command line override?

 

Thanks

2 REPLIES 2

Re: Spark submit executor options

Master Collaborator

spark-submit and spark-shell work the same way. Are you sure your cluster has enough resources to allocate 7 executors of the size you request? it sounds like it didn't from what you say.

Re: Spark submit executor options

Master Collaborator

I am sure, as I checked on the Spark UI the list o executors, there were seven of them.

One node has 2GB for YARN and 2 cores. I have 4 nodes, and YARN configured as 1 core/container, 512M min per container, 100M increment.

 

So I did a calculation: If I want to run two containers, that means 2x 1024MB. 1024 - 384MB (this is a Spark overhead) is 640MB.

To be sure, I set only 600MB.

 

As I mentioned above submitting 

 

spark-submit /root/program3/target/scala-2.10/simple-project_2.10-1.0.jar --master yarn-client --executor-memory 600M --num-executors 7

 

launched the app with only two executors.

twoexec.png

 

After adding

spark.executor.cores=1
spark.executor.memory=600M
spark.executor.instances=7

to the /etc/spark.... spark-default.conf file and submitting

 

spark-submit /root/program3/target/scala-2.10/simple-project_2.10-1.0.jar --master yarn-client

 

launched the application with seven executors.

sevenexec.png

 

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here