Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Memory issue while running spark application

Memory issue while running spark application

Explorer

 

Spark 1.6.0:

 

I have a spark application ( with 5 sql joins with some filtering), which is giving an error: 

 

java.lang.OutOfMemoryError: Unable to acquire 356 bytes of memory, got 0

 

But when I run this with 1000 shuffle partitions, it is running fine. 

'SET spark.sql.shuffle.partitions = 1000'

 

The default shuffle partitions is 200, Is it good to give many shuffle partitions? How do I track down what is causing this? 

I hope this is not related to this bug (https://issues.apache.org/jira/browse/SPARK-14363), since I am using version 1.6.0.

 

Thanks

 

 

1 REPLY 1
Highlighted

Re: Memory issue while running spark application

Master Collaborator

Yes, it certainly can be a help if your join results in an imbalanced distribution of data. Some keys will have many values and if they all pile onto one partition, it could run out of memory. More partitions may split up the data more evenly. 

 

The error is just "out of memory". Unless you see a similar stack trace to https://issues.apache.org/jira/browse/SPARK-14363 then I don't think it's necessarily related.

 

So another possible answer is, give more memory to your job. Or redesign it a bit to avoid big shuffles.

Don't have an account?
Coming from Hortonworks? Activate your account here