For the yarn-utils.py,need to input CPU cores and DISKS on the hosts, how to count the cpu cores on host? just physical cpu? or cores or hyperthreads?
and especially for the DISKS, is it total phsical disks on the host? include OS installed disk? or just the disks for the HDFS? How to determine the DISKS value, if we using RAID5(12 physical disks),is the DISKS=1 or DISKS=12?
According to with less DISKS installed on the host, the formula
# of containers = min (2*CORES, 1.8*DISKS, (total available RAM) / MIN_CONTAINER_SIZE)
the # of containers must be determined by the DISKS, such as 1.8 , so to avoid the waste of memory , need I set the
to a larger value?
I want to more containers to support more concurrent applications, how to make the cluster with more containers if the physical disks on hosts are small?
Confused by the memory, cpu , DISKS and # of containers and concurrent running applications.
For 2.6.1, the documentation (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_command-line-installation/content/determine-hdp-memory-config.html) points to the link below, which apparently doesn't exist.
Can anyone point me in the right direction?
It's not entirely clear to me either what's the right value of DISKS, especially in the RAID case. The documentation you link* says:
DISKS is the value for *dfs.datanode.data.dir* (number of data disks) per machine.
HDP v2.5 - dfs.datanode.data.dir
HDP v2.6 - dfs.data.dirs