Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pyspark Stucks at stage 66

Pyspark Stucks at stage 66

New Contributor

I have launched pyspark in local mode and trying to join two RDDs . I am using Spark Version 1.6 on cloudera VM.

 

When i try to use the below statement 

joinData.count()

 

pyspark stucks at Stage 66.
[Stage 66:=============================> (4 + 0) / 8]
[Stage 66:=============================> (4 + 0) / 8]
[Stage 66:=============================> (4 + 0) / 8]

 

Any idea why this is happening and how to fix this?

1 REPLY 1

Re: Pyspark Stucks at stage 66

Expert Contributor

Interesting. Since you are running in local mode, have you already tried adjusting the threads local[N] (i.e increasing or decreasing the value of N). Also, how many logical cores do you have on the server? It will be good to know what is the program doing and how big is the dataset?