Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark multiple sessions


pyspark multiple sessions


We have a user who wants to execute multiple pyspark sessions on the edge node. But past 13 sessions, new sessions simply hang apparently with the message : could not bind on port 4040 etc.

These are just blank sessions and there is no job executed. They are simply testing the connections.

They tried specifying the ports and could execute more than 13 session simultaneously. But they don't want to do this.

So are new sessions hanging due to resource crunch?

The edge node has 138 cores.

I had the user try :

pyspark --master local [4]

So with the above command he could create more than 13 sessions.

But is the above command restricting spark to run only on the local host? Will the job thus executed not scale across the cluster?

Likewise can below command be used to execute spark across the cluster with specified cores :

pyspark --master=yarn --deploy-mode=cluster --executor-cores 4

Appreciate the insights.