Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Submit Multiple Jobs in Cluster Environment

Highlighted

Spark Submit Multiple Jobs in Cluster Environment

Contributor

Is it possible to submit multiple jobs in Spark-Submit ?

I have created multiple jobs to be submitted in the spark-submit

I have to submit these created jobs in a cluster deploy mode

Is it possible to submit the jobs to desired cores in the cluster deploy mode?

like job 1 - core 1

job 2 - cores 2 and 3

similarly....

Is it possible to schedule the jobs to the desired cores?

Note: Cores should be assigned manually by the user

2 REPLIES 2

Re: Spark Submit Multiple Jobs in Cluster Environment

@Sridhar Babu M

Well in general you can simply run multiple instances to spark-submit in a shell for loop with dynamic no. of cores.

Like.

for i in 1 2 3

do

spark-submit class /jar --executor-memory 2g --executor-cores 3 --master yarn --deploy-mode cluster

done

Now for scheduling a spark job, you can use oozie to schedule and run your spark action oozie-spark or may you try running spark program directly using oozie shell action here

Re: Spark Submit Multiple Jobs in Cluster Environment

Guru

@Sridhar Babu M

Since cores per container are controlled by Yarn configuration, I believe you will need to set the number of executors and the number of cores per executor based on your Yarn configuration to control how many executors and cores get scheduled. So if you set Yarn to allocate 1 core per container and you want two cores for the job then ask for 2 executors with 1 core each from Spark submit. That should give you two containers with 1 executor each. I don't think Yarn will give you an executor with 2 cores if a container can only have 1 core. But if you can have 8 cores per container then you can have 8 executors with 1 core or 4 executors with 2 cores per container. Of course, you can continue to add executors as long as you your Yarn queue has capacity for more containers.

# Run on a YARN cluster

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 2 --executor-cores 1 /path/to/examples.jar

Don't have an account?
Coming from Hortonworks? Activate your account here