Reply
New Contributor
Posts: 3
Registered: ‎02-05-2018

spark 2 memory error

[ Edited ]

Hello,

i am running spark2.1.cloudera1 over yarn in CDH 5.7.1.

 

I have pyspark job submitted with yarn cluster that always failed.

There is not error in the stdout/stderr of the driver, but i saw on the logs of

the NodeManager than the driver ran over him the error:
yarn container is running beyond physical memory limits

 

The spark job is very big, it has 1000+ of jobs and it should take about 20hours.

 

unfortunatly, i can`t post my code,

but i can approve that driver-functions(e.g collect) is being done over few rows,

and the code shouldn`t crash on driver memory.

 

Just for understanding, i gave the driver 70GB of memory (spark.driver.memory),

but it seems that after 4 hours + - it crash.

 

I tried to optimize with some parameters but nothing help.

 

Any one has suggestion of parameters i can try, or what should i do in case like that?

I think the problem is with yarn, because this job is running on my spark standalone cluster with same configuration.

 

Posts: 418
Topics: 13
Kudos: 67
Solutions: 39
Registered: ‎09-02-2016

Re: spark 2 memory error

@Sparky24

 

It seems your jobs are using yarn and You have mentioned that you have tried with some parameters but not sure what parameters that you have tried and what was the value that you have used for those parameters.

 

Any how, In general this issue is possible when some of your parameters are not meeting the below criteria

 

1. yarn.scheduler.minimum-allocation-mb <= mapreduce.map.memory.mb

2. yarn.scheduler.maximum-allocation-mb <= yarn.nodemanager.resource.memory-mb.

3. mapreduce.map.memory.mb <= yarn.scheduler.maximum-allocation-mb 

3. mapreduce.map.java.opts = (mapreduce.job.heap.memory-mb.ratio) * (mapreduce.map.memory.mb)

4. Do the same for reducer

 

Hope this will help you

New Contributor
Posts: 3
Registered: ‎02-05-2018

Re: spark 2 memory error

The basic parameter is spark.driver.memory that was set to 70g,

 

i was trying give more parameters like:
spark.driver.extraJavaOptions(Giving it Xms)

spark.driver.memoryOverHead

and more parameters that i cant remmember their name (smth like spark.driver.nonHeapEnabled)

 

I dont understand how the parametrs you give me below should help...

 

scheduler.minimum & maximum allocated mb are set throw the cloudera for (min 4GB & max 100GB)

 

Its a spark job so why i would take those parametrs "mapreduce.map.memory.mb"

 

4. There is no reducer.

Posts: 418
Topics: 13
Kudos: 67
Solutions: 39
Registered: ‎09-02-2016

Re: spark 2 memory error

@Sparky24

 

Pls go to CM -> Spark -> Configuration -> Search for "YARN (MR2 Included) Service". If it has been enabled for "YARN (MR2 Included)" then Spark service instance has dependency on YARN (MR2 Included).

 

I assume it should be enabled (i may be wrong) in your case because your error shows issue related to yarn 'yarn container is running beyond physical memory limits'

 

If my above understand is correct then hope it will answer your question!!

 

Announcements