ROOT CAUSE: GC errors were causing the issue for namenode to start
RESOLUTION: Below are the steps taken to resolve the issue -
1. Logged in to the namenode cli
2. When checked from cli using “ps –ref |grep –i namenode”, the namenode was not displaying.
3. Seems that the namenode process was getting killed after specific interval of time, but ambari was still showing the namenode process state in UI as “starting”
4. Cancelled the namenode starting process from Ambari UI.
5. Tried starting the whole HDFS process and simultaneously ran “IOSTAT” on the fsimage disk.
6, We found within iostat output the “Blk_read/s” was not displaying any value.
7. The namenode process was still getting killed.
8. We tried to enable debug using “export HADOOPROOTLOGGER=DEBUG,console” and ran the command “hadoop namenode”
9. We found that the namenode was have GC issue from the above command logs.
10. We suggested customer to increase Namenode HEAP SIZE from 3Gb to 4Gb and customer was able to start the Namenodes.
11. As per namenode heap recommendations “https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html”
12. Increased HEAP size for namenode to “5376m”to "8072m" as there was approx 10million files on cluster.