Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark gets stuck on launch - not responding - i am running out of time

pyspark gets stuck on launch - not responding - i am running out of time

Rising Star

Hello, 

 

Due to limitations of SPARK ML 1.6 , i had to upgrade spark to Spark 2 , every configuration is fine. 

 

I have 4 host cluster, if i launch pyspark from master its gets stucked at launch or otherwise it will show warn that couldn't find ui,port trying to connect 4041 etc.

 

Strange thing here is all those ports ar unoccupied , can somebody help ?

 

 

77777.png40530779_518637238564555_3673338773531262976_n.png

4 REPLIES 4

Re: pyspark gets stuck on launch - not responding - i am running out of time

Master Collaborator
I dont kno why it is stucked, but the warning is about the open ports. Each spark program (driver) opens a port, starting from 4040 onwards. So when 4040 is occupied (by another spark driver) it tries 4041, and so on. Until it reach a maximum port number and returns error
Highlighted

Re: pyspark gets stuck on launch - not responding - i am running out of time

Champion

@hadoopNoob

 

yes, it may be due to port, pls try the below

 

export SPARK_MAJOR_VERSION=2
pyspark --master yarn --conf spark.ui.port=12888
pyspark --master yarn --conf spark.ui.port=4041
pyspark --master yarn --conf spark.ui.port=4042
etc

Re: pyspark gets stuck on launch - not responding - i am running out of time

Rising Star

i tried your suggestion already, but did it again and now it gets stuck here. upon using ctrl plus c it skips to executor

 

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve l(newLevel).

 

 

 

strange thing is it works fine on other nodes, should i use them then ?

Re: pyspark gets stuck on launch - not responding - i am running out of time

Champion

@hadoopNoob

 

if the command is working on the other nodes then run the netstat command again on both the nodes (for the port starting 4040) to see the difference. 

 

it is clear that it is not a spark issue as it is working form other nodes. so you have to identify the port open/availability status