Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Use Spark - Yarn to launch ANY application

avatar
Contributor

Hi everyone, as title i'd like to run ANY generic application (written by users) .
So the idea is:
1) User write its own Python application (about anything he wants, without using Spark API)
2) I run this app via Spark-submit 
3) A SINGLE DOCKER CONTAINER on Yarn is created, with the resources i set

It actually works, but....
Since the users dont use Spark, the Executors "role" in the Spark mechanism is useless, or better, the user DOESNT NEED THEM, so basically the user app should run only in the Application Master.
I tried to set spark.executor.instances to 0 (in spark-submit) but an Error appears during the submit saying that "must be a positive value".
So, do you know i there is a way to disable/enable executors and just use AM as i please? (Maybe assign all resources to AM and 0 to executors *?*)
Because there will be also user that may want to use Spark, so they will have also executors.

Thanks for any advice on that

1 REPLY 1

avatar
Contributor

Hello @loridigia 

I don't think there is a direct way to achieve this. But we have a workaround to do that.

We can start the Spark jobs with Dynamic Allocation enabled. And we can set the Minimum executors to "0", initial executors to "1" and the idle timeout to "5s".

With these configurations, the Spark job will start with 1 executor and after 5 seconds that container will be killed as it will be idle for more than 5 seconds.

Now, we will have a Spark application only with the Driver / ApplicationMaster container running.

 

CONFIGS:

--conf spark.dynamicAllocation.enabled=true
--conf spark.shuffle.service.enabled=true
--conf spark.dynamicAllocation.executorIdleTimeout=5s
--conf spark.dynamicAllocation.initialExecutors=1
--conf spark.dynamicAllocation.maxExecutors=1
--conf spark.dynamicAllocation.minExecutors=1

 

NOTE:
We can add these configs to the spark-defaults.conf so that the changes will be applied to all the Running jobs.

Please be careful with other / actual Spark job configurations.

 

Make sure to mark the answer as the accepted solution. If it resolves your issue !