Created 03-02-2017 10:22 PM
To Hadoop Guru's,
I am new in planning cluster and need some directions in doing some capacity planing for Hadoop Cluster.
Production cluster will be on
1) Node 1: Namenode
2) Node 2: Resouce Manager Node
3) Node 3: Standby Name node
But not sure how much RAM will be required for namenode and each datanode, as well as no of CPU's. The block size is 128MB. Let me know what else needs to be consider to select the RAM size and # of CPU's.
Created 03-02-2017 10:25 PM
Created 03-02-2017 10:51 PM
@Sachin Ambardekar There is documentation at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/ch_hardware-reco... that discusses overall cluster planning. Things like memory sizing, configurations for different types of nodes (masters vs. workers), and other hardware considerations are detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/server-node.1.ht....
Created 03-02-2017 11:40 PM
As per link, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/hardware-for-sla...
"Depending on the number of cores, your slave nodes typically require 24 GB to 48 GB of RAM for Hadoop applications"
Does this mean that each datanode should have 24GB - 48GB RAM and quad core CPU?
Created 03-11-2017 08:11 PM
Created 03-12-2017 03:00 AM
- The NameNode heap size depends on many factors, such as the number of files, the number of blocks, and the load on the system. So you can refer to know how much Heap will be usually needed for the NameNode based on the number of files and same thing will apply on the "StandBy NameNode". So you can plan for Xmx heap memory and RAM on that namenode host accordingly.
- Similarly for YARN (like resource Manager) you can use the HDP utility script is the recommended method for calculating HDP memory configuration settings, also information about manually calculating YARN and MapReduce memory configuration settings is also provided for reference. See below link
# python yarn-utils.py -c 16 -m 64 -d 4 -k True OUTPUT ====== Using cores=16 memory=64GB disks=4 hbase=True Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4 Num Container=8 Container Ram=6144MB Used Ram=48GB Unused Ram=16GB yarn.scheduler.minimum-allocation-mb=6144 yarn.scheduler.maximum-allocation-mb=49152 yarn.nodemanager.resource.memory-mb=49152 mapreduce.map.memory.mb=6144 mapreduce.map.java.opts=-Xmx4096m mapreduce.reduce.memory.mb=6144 mapreduce.reduce.java.opts=-Xmx4096m yarn.app.mapreduce.am.resource.mb=6144 yarn.app.mapreduce.am.command-opts=-Xmx4096m mapreduce.task.io.sort.mb=1792 tez.am.resource.memory.mb=6144 tez.am.launch.cmd-opts =-Xmx4096m hive.tez.container.size=6144 hive.tez.java.opts=-Xmx4096m
Created 03-14-2017 05:47 AM
@Sachin Ambardekar, The doc above may be slightly dated. Rule of thumb, 4GB per core seems to be the sweet spot for memory intensive workloads which are getting more common nowadays.