If DataNode increases, then do we need to upgrade NameNode?
NameNode is the single component of Hadoop cluster that should be well planned and should NOT run on commodity hardware as opposed to data nodes. It should be deployed on reliable hardware as it is the centrepiece of HDFS.
Also known as Master node the Namenode stores meta-data i.e. number of Blocks, their location, replicas and other details. This meta-data is available in memory in the master for faster retrieval of data it also maintains and manages the slave nodes, and assigns tasks to them. Below listed are the main function performed by NameNode:
Having said that the addition of extra data nodes doesn't necessarily need you to upgrade your namenode. The Namenode should have enough memory (RAM) to store the metadata in memory for faster access. See HW document on Namenode Heap size setting
Depending on the number of files stored on your namenode Hortonworks recommends a maximum of 300 million files on the NameNode with an approximation of 1GB to 1 million files 512 GB of memory would be more than enough.
The NameNode only stores the metadata of blocks in the DataNode. The NameNode utilizes 150 bytes of memory per block.
Generally, it is recommended to allocate 1 GB of memory (RAM) for every 1 million blocks.
Based on the above recommendation we can determine the requirement of NameNode while installing the Hadoop system, by considering the size of the cluster. Since the NameNode stores only the metadata it is rare that such requirement to upgrade the NameNode arise. However, there is a possibility of vertical scalability for NameNode.