Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Memory issue while running spark application


Memory issue while running spark application



Spark 1.6.0:


I have a spark application ( with 5 sql joins with some filtering), which is giving an error: 


java.lang.OutOfMemoryError: Unable to acquire 356 bytes of memory, got 0


But when I run this with 1000 shuffle partitions, it is running fine. 

'SET spark.sql.shuffle.partitions = 1000'


The default shuffle partitions is 200, Is it good to give many shuffle partitions? How do I track down what is causing this? 

I hope this is not related to this bug (, since I am using version 1.6.0.






Re: Memory issue while running spark application

Master Collaborator

Yes, it certainly can be a help if your join results in an imbalanced distribution of data. Some keys will have many values and if they all pile onto one partition, it could run out of memory. More partitions may split up the data more evenly. 


The error is just "out of memory". Unless you see a similar stack trace to then I don't think it's necessarily related.


So another possible answer is, give more memory to your job. Or redesign it a bit to avoid big shuffles.

Don't have an account?
Coming from Hortonworks? Activate your account here