Cloudera Community

Community Articles

Find and share helpful community-sourced technical articles.

Advanced Search

Expert Contributor

In large clusters , sometimes restarting Namenode or a secondary namenode will fail and Ambari will keep trying multiple times then fails.

One thing can be done quickly is to increase the timeouts of Ambari from 5s to 25s ( or up to 50s )

In

/var/lib/ambari-server/resources/common-services/HDFS/XXX-VERSION-XXX/package/scripts/hdfs_namenode.py

From this:

@retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)

To this:

@retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)

If it still fail, you can try

@retry(times=50, sleep_time=50, backoff_factor=2, err_class=Fail)

One of the root causes of this maybe SOLR audit logs ( from Ambari Infra ) when creating huge logs that needs to be written to hdfs.

Restart Ambari server

You can clear the logs of NN and SNN here : /var/log/hadoop/hdfs/audit/solr/spool

Becareful on deleting only on Standby NN - then do a failover to delete from the other server. do not delete logs while the namenode is active.

971 Views

Announcements

What's New @ Cloudera

[RELEASED] Cloudera Streaming Analytics 1.14 for Cloudera Pu...

What's New @ Cloudera

Cloudera Data Engineering 1.23: Access Spark from Your Favor...

What's New @ Cloudera

HBase REST server scaling support is Generally Available

What's New @ Cloudera

New CLI option in the update-database command

What's New @ Cloudera

New Action menu item in the Cloudera Operational Database UI

Top Kudoed Authors

User

Count

766

379

316

309

270