Support Questions

Find answers, ask questions, and share your expertise

Unable to run multiple pyspark sessions


I am new to coudera. I have installed cloudera express on a Centos 7 VM, and created a cluster with 4 nodes(another 4 VMs). I ssh to the master node and run: pyspark

This works but only for one session. If I open another console and run pyspark I will get the following error:


WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


And it gets stuck there and does nothing until I close the other session running pyspark! Any idea why this is happening and how I can fix this so multiple sessions/user can run pyspark? Am I missing some configurations somewhere?


Thanks in advance for your help.





In general one port will allow one session (one connection) at a time, so your 1st session connects to the default port 4040 and your 2nd session is trying to connect to the same port but got the bind issue, so trying to connect to the next port but it is not working


there are two things that you need to check

1. please make sure the port 4041 is open 

2. On your second session, when you run pyspark, pass the avilable port as a parameter.


     Ex: Long back i've used spark-shell with different port as parameter, pls try similar option for pyspark

     session1: $ spark-shell --conf spark.ui.port=4040
     session2: $ spark-shell --conf spark.ui.port=4041


     if 4041 is not working you can try upto 4057, i think thease are the available port for spark by default


Thank you for your help. I tried different ports, but it still doesn't work,unless I kill the running session and start another one. Can it be that I had wrong configuriation(s) during cloudera installation?  Or changes needed to be made in any configuration files or somewhere else?




did you get a chance to get answer for my first question


@saranvisa Sorry forgot to mention that... yes I did. The port is open.




can you try to run the 2nd pyspark command from a different user id?


because it seems this is normal issue according to the below link





Just tried that. It's not working for different users either.


It looks like things cannot run in parallel but more in a queue form. Maybe missed/misconfgured something in the installation process. 

Expert Contributor


WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


^ This generally means that the problem is beyond the port mapping ( i.e either with queue configuration/ available resources/YARN level).


Assuming that you are using spark1.6, I'd suggest to temporarily change the shell logging level to INFO and see if that gives a hint. The easy and quick way to do this would be to edit /etc/spark/conf/ from the node you are running pyspark and modify the log level from WARN to INFO.


# vi /etc/spark/conf/ 

$ spark-shell
18/04/10 20:40:50 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
18/04/10 20:40:50 INFO util.Utils: Successfully started service 'SparkUI' on port 4041.
18/04/10 20:40:50 INFO client.RMProxy: Connecting to ResourceManager at
18/04/10 20:40:52 INFO impl.YarnClientImpl: Submitted application application_1522940183682_0060
18/04/10 20:40:54 INFO yarn.Client: Application report for application_1522940183682_0060 (state: ACCEPTED)
18/04/10 20:40:55 INFO yarn.Client: Application report for application_1522940183682_0060 (state: ACCEPTED)
18/04/10 20:40:56 INFO yarn.Client: Application report for application_1522940183682_0060 (state: ACCEPTED)
18/04/10 20:40:57 INFO yarn.Client: Application report for application_1522940183682_0060 (state: ACCEPTED)



Next, open the Resource Manager UI and check the state of the Application (i.e your second invocation of pyspark) -- whether it's is registered but just stuck in ACCEPTED state like this:


Screen Shot 2018-04-11 at 9.22.53 am.png



If yes, look at the Cluster Metrics row at the top of the RM UI page and see if there are enough resources available:


Screen Shot 2018-04-11 at 9.31.10 am.png



Now kill the first pyspark session and check if the second session changes the state RUNNING in the RM UI. If yes, look at the queue placement rules and stats in Cloudera Manager > Yarn > Resource Pools Usage (and Configuration)



Screen Shot 2018-04-11 at 9.59.44 am.png



Hopefully, this would give us some more clues. Let us know how it goes? Feel free to share the screen-shots from the RM UI and spark-shell INFO logging.


Thanks. I really appreciate your response. My advisor actually found out that this will work if we use the following command:


$ pysark --master local[i]  


where i is a number. Using this command, multiple pyspark shells could run concurrently. But why the other solutions did not work, I have no clue!

Expert Contributor

@hedy thanks for sharing.


The workaround you received makes sense when you are not using any cluster manager(?)


Local mode ( --master local[i]is generally seen if you want to test or debug something quickly since there will be only one JVM launched on the node from where you are running pyspark and this JVM will act as driver, executor, and master -> all-in-one. But of course with local mode, you lose the scalability and resource management that a cluster manager provides. If you want to debug why simultaneous spark shells are not working when using Spark-On-Yarn, we need to diagnose this from YARN perspective (troubleshooting steps shared in the last post). Let us know.

New Contributor

I am facing the same issue and can anyone please suggest how to resolve this. On running two spark application , one remains at accepted state while other is running.

What is the configuration that needs to be done for this to be working?


Following is the configuration for dynamic resource pool config:

resource pool.JPG

Please help!