Created on 07-24-2017 11:26 AM
In large clusters , sometimes restarting Namenode or a secondary namenode will fail and Ambari will keep trying mltiple times then fail.
One thing can be done quickly is to increase the timeouts of Ambari from 5s to 25s.
In
/var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
From this:
To this:
If it still fail, you can try
One of the root causes of this maybe SOLR audit logs ( from Ambari Infra ) when creating huge logs that needs to be written to hdfs.
You can clear the logs of NN and SNN here : /var/log/hadoop/hdfs/audit/solr/spool
Becareful on deleting only on Standby NN - then do a failover to delete from the other server. do not delete logs while the namenode is active.