Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark on Yarn - How to run multiple tasks in a Spark Resource Pool

avatar
Contributor

Hi,

 

I am running Spark jobs on YARN, using HDP 3.1.1.0-78 version. 

 

I have set the Spark Scheduler Mode to FAIR by setting the parameter "spark.scheduler.mode" to FAIR.  The fairscheduler.xml is as follows:

 

I have also configured my program to use "production" pool. 

 

Spark_Fair_Scheduler_1.PNG

 

Upon running the job, it has been observed that although 4 stages are running, only 1 stage run under "production" and rest 3 run under "default" pool. 

 

So, at any point of time, I am able to make sure that only 2 tasks are running in parallel.  If I want to make sure that 3 tasks or more run in parallel, then 2 tasks should run under "production" and rest 2 should run under "default". 

 

Is there any programmatic way to achieve that, by setting configuration parameters? 

 

Any inputs will be really helpful. 

 

Thanks and Regards,

Sudhindra 

1 ACCEPTED SOLUTION

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Contributor

Additional Information:

Active_Stages.PNG

As we can see, even though there are 3 stages active, only 1 task each is running in Production as well as Default pools. 

 

My basic question is - how can we increase the parallelism within pools?  In other words, how can I make sure that the Stage ID "8" in the above screenshot also runs in parallel with the other 2 

 

Thanks and Regards,

Sudhindra

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login