Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive and spark is using more memory than maximum memory allocated in the yarn

Hive and spark is using more memory than maximum memory allocated in the yarn

New Contributor

Is their any way to control memory utilization exceed than maximum memory allocated in the Yarn resource manager.


My configuration placed in yarn is :-

yarn.scheduler.minimum-allocation-mb = 1024

yarn.scheduler.maximum-allocation-mb = 4096


yarn.schedular.minimum-allocation-vcores = 3

yarn.schedular.maximum-allocation-vcores = 3


Error:-

Now the point however it is ignoring the static configuration which placed in the yarn.


It is picking more memory When i observed in Yarn-UI (I am seeing these configuration)

running containers to 72

Allocated Cpu v-cores to 72

Allocated memory MB to 120034


Please help me how to set to the maximum level which placed in yarn configuration


Thanks a lot



2 REPLIES 2

Re: Hive and spark is using more memory than maximum memory allocated in the yarn

Contributor

Hi @Vasanth Reddy,


If I understood you correctly, I think you should be checking this:


https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Setting_u...


These are per container properties:

yarn.scheduler.minimum-allocation-mb = 1024

yarn.scheduler.maximum-allocation-mb = 4096

yarn.schedular.minimum-allocation-vcores = 3

yarn.schedular.maximum-allocation-vcores = 3


While you're watching at cluster wide metrics:

running containers to 72

Allocated Cpu v-cores to 72

Allocated memory MB to 120034


With the above settings, at one point you may have the same 72 containers with Allocated memory MB to 294912 (72 containers * maximum mb 4096)


Let me know if I missunderstood your question.


BR,

David Bompart

Re: Hive and spark is using more memory than maximum memory allocated in the yarn

New Contributor

HI @dbompart

Yes the logic you mentioned is perfectly


I have some more clarification regarding containers on Map Reduce and Spark


In Map Reduce running sqoop Import

In Spark running PySpark shell on top of yarn


Now the configuration :

MapReduce:-

yarn.scheduler.maximum-allocation-MB :- 36864 * 2 = 73728


But my concern is now how can i limit the Running containers per user basics (I cant set Different queues in capacitor scheduler as mentioned above)


-> When ever i am running spark application is also running on top of yarn


Running Containers :- 3

Allocated CPU's :- 3


Total Memory allocated :- 5120

108772-1558091526588.png


Will you help me the logic what is happening behind these


Thanks a lot