Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

YARN Cluster memory issue


I am a not very developer trying to help with a POC with the following configuration I have an 8 node 2.6.5 clustering.

  • 1 edge node (62 GB RAM 16 core)
  • 2 Namenodes (62 GB RAM 16 core & 12 GB RAM 4 core)
  • 5 data nodes (187 GB RAM 32 cores each)

My cluster users are experiencing resource issues, when 2 users run spark through Zeppelin notebook and it clogs the cluster, literally, it consumes 93% of the resources.I have tried running the YARN Utility Script but I think I am getting mixed up. Based on the screenshots attached I am giving the following parameters the script I have hbase is installed,

python -c 32 -m 187 -d 7 -k True

This is my reference Hortonworks YARN reference after the script has successfully run I changed the Yarn and Mapred settings according to the script recommendations but I end up with only 11 cores what am I doing wrong?

What's the correct way of running the script taking into account the memory and cores available. How should I configure the spark environment not to use up all the memory or to release once the job is done?

NB. I have also isolated the users as see user isolation/scoped jpg

I just feel I am not doing the right thing



1. Spark Dynamic allocation
I believe your Zeppelin is configured to spawn as many executors as possible for SPARK. Kindly enable Dynamic allocation for Spark in Zeppelin.

2. Yarn Queue User Limit.
Can you also check whats your YARN queue configuration.
You can limit the number of containers that can be used by a given user using user limit factor.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.