It is expected in large clusters where node count ranges to few hundreds, the master services tend to be busy. One such master service is Namenode. Some of the critical activities that NN does includes,
1. Addressing client requests which includes verifying proper permissions, auth checks for HDFS resources.
2. Constant block report monitoring from all the Datanodes.
3. Updating the service and audit logs.
are to name a few.
In certain situations when there are rogue applications which tries to access multiple resources in HDFS or a data ingestion that is trying to load high data volumes, NN tends to be very busy. In such situations and cluster like these NN FSImage tends to be in $$GB. Hence, operations such as checkpointing would consume considerable bandwidth across the two Namenodes. Hence, high volume of edits sync along with loggings would cause high disk utilization which can lead to NameNode instability. Hence, it is recommended to have dedicated disks for service logs and edit logs.
We can monitor the IO on the disks using `iostat` output.