Reply
New Contributor
Posts: 1
Registered: ‎04-02-2018

Pyspark Stucks at stage 66

I have launched pyspark in local mode and trying to join two RDDs . I am using Spark Version 1.6 on cloudera VM.

 

When i try to use the below statement 

joinData.count()

 

pyspark stucks at Stage 66.
[Stage 66:=============================> (4 + 0) / 8]
[Stage 66:=============================> (4 + 0) / 8]
[Stage 66:=============================> (4 + 0) / 8]

 

Any idea why this is happening and how to fix this?

Highlighted
Cloudera Employee
Posts: 66
Registered: ‎11-16-2015

Re: Pyspark Stucks at stage 66

Interesting. Since you are running in local mode, have you already tried adjusting the threads local[N] (i.e increasing or decreasing the value of N). Also, how many logical cores do you have on the server? It will be good to know what is the program doing and how big is the dataset? 

 

Announcements