Support Questions

Find answers, ask questions, and share your expertise

Namenode and Datanode capacity planning

avatar
Contributor

To Hadoop Guru's,

I am new in planning cluster and need some directions in doing some capacity planing for Hadoop Cluster.

Production cluster will be on

1) Node 1: Namenode

2) Node 2: Resouce Manager Node

3) Node 3: Standby Name node

4) Datanodes

But not sure how much RAM will be required for namenode and each datanode, as well as no of CPU's. The block size is 128MB. Let me know what else needs to be consider to select the RAM size and # of CPU's.

Thank you,

SA

6 REPLIES 6

avatar
Expert Contributor

@Sachin Ambardekar There is documentation at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/ch_hardware-reco... that discusses overall cluster planning. Things like memory sizing, configurations for different types of nodes (masters vs. workers), and other hardware considerations are detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/server-node.1.ht....

avatar
Contributor

As per link, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/hardware-for-sla...

"Depending on the number of cores, your slave nodes typically require 24 GB to 48 GB of RAM for Hadoop applications"

Does this mean that each datanode should have 24GB - 48GB RAM and quad core CPU?

Please advise.

Thank you,

Sachin A

avatar
Contributor

Any update?

Thank you,

Sachin A

avatar
Master Mentor

@Sachin Ambardekar

- The NameNode heap size depends on many factors, such as the number of files, the number of blocks, and the load on the system. So you can refer to know how much Heap will be usually needed for the NameNode based on the number of files and same thing will apply on the "StandBy NameNode". So you can plan for Xmx heap memory and RAM on that namenode host accordingly.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/ref-80...

- Similarly for YARN (like resource Manager) you can use the HDP utility script is the recommended method for calculating HDP memory configuration settings, also information about manually calculating YARN and MapReduce memory configuration settings is also provided for reference. See below link

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/determ...

Example:

#  python yarn-utils.py -c 16 -m 64 -d 4 -k True

OUTPUT
======
Using cores=16 memory=64GB disks=4 hbase=True
Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4 
Num Container=8
Container Ram=6144MB 
Used Ram=48GB
Unused Ram=16GB
yarn.scheduler.minimum-allocation-mb=6144 
yarn.scheduler.maximum-allocation-mb=49152 
yarn.nodemanager.resource.memory-mb=49152 
mapreduce.map.memory.mb=6144 
mapreduce.map.java.opts=-Xmx4096m 
mapreduce.reduce.memory.mb=6144 
mapreduce.reduce.java.opts=-Xmx4096m 
yarn.app.mapreduce.am.resource.mb=6144 
yarn.app.mapreduce.am.command-opts=-Xmx4096m 
mapreduce.task.io.sort.mb=1792 
tez.am.resource.memory.mb=6144 
tez.am.launch.cmd-opts =-Xmx4096m 
hive.tez.container.size=6144 
hive.tez.java.opts=-Xmx4096m

avatar
Contributor

@Sachin Ambardekar, The doc above may be slightly dated. Rule of thumb, 4GB per core seems to be the sweet spot for memory intensive workloads which are getting more common nowadays.