Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN MEMORY 100% full , some of jobs gets fail..?any solution for this?..i have attached my yarn configuration..

avatar
Expert Contributor

yarn-cacity-scheduler.pngHi..i am using HDP 2.1.2 with 7 node PROD cluster(5 data nodes and 2 name nodes) and name nodes having 32Core s and 256 gb ram ,data nodes have 24 cores and 125gb rams

1 ACCEPTED SOLUTION

avatar
Super Guru

@sankar rao You need to check the Resource Manager UI scheduler page http://RM_UI:8088/cluster/scheduler

and check for the "Application Queues" usage. Accordingly if you see if few jobs are taking more time[say in Hrs] than they are expected then you might have to kill the jobs and re-run with tuning them.

Other option is to make sure you have tune up your MR-Yarn settings to use whole memory of cluster. For tuning refer -

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/determi...

View solution in original post

4 REPLIES 4

avatar
Super Guru

@sankar rao You need to check the Resource Manager UI scheduler page http://RM_UI:8088/cluster/scheduler

and check for the "Application Queues" usage. Accordingly if you see if few jobs are taking more time[say in Hrs] than they are expected then you might have to kill the jobs and re-run with tuning them.

Other option is to make sure you have tune up your MR-Yarn settings to use whole memory of cluster. For tuning refer -

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/determi...

avatar
Expert Contributor

@Sagar Shimpi

Thank you ...i will try to tune ..i let you know the result ...

avatar
Super Guru

@sankar rao

Killing jobs is the not the right way to go about your production environment. Doing global tuning can help some jobs and impact other jobs. You need to understand your jobs and their resource requirements. You could set the size of containers as such that you maximize the use of resources, you could do a lot of things with Tez parameters, etc., but again boils to understanding your jobs and their requirements. You need to identify those jobs that use a lot of resources and optimize them with design techniques. You also need to manage your queues and their allocated resources and assign applications or users to a specific queues based on their workload and SLA needs. Not last, plan for adding more resources to your cluster in a proactive manner setting thresholds for alerts to preempt YARN 100%.

avatar
Expert Contributor

Thank you @Constantin Stanca

I have learn lot of things from your answer ..I have a few concerns over this comment, Can you please make to overcome these..

#How can i set the alerts to preempt YARN 100%

# How can i achieve this one " could do a lot of things with Tez parameters"