Community Articles

kkanchu · ‎09-19-2016

During the process of Express Upgrade for a cluster that was deployed using Blueprint configured to be a HDFS HA, for a cluster with versions HDP 2.4.2 and Ambari 2.2.2, the NN failed to restart as shown below.

13debf893a605e8a88df18a7d8d214f571e05289; compiled by 'jenkins' on 2016-04-25T05:46Z\nSTARTUP_MSG: java = 1.8.0_60\n************************************************************/\n16/09/14 18:40:33 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]\n16/09/14 18:40:33 INFO namenode.NameNode: createNameNode [-bootstrapStandby, -nonInteractive]\n16/09/14 18:40:34 WARN common.Util: Path /hadoopfs/fs1/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.\n16/09/14 18:40:34 WARN common.Util: Path /hadoopfs/fs1/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.\n16/09/14 18:40:35 INFO ipc.Client: Retrying connect to server: <IP ADDRESS>:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)

RCA:

Problem is that during a Blueprint deployment by Cloudbreak, it keeps the following 2 configs in hadoop-env

dfs_ha_initial_namenode_active dfs_ha_initial_namenode_standby

When it's time to perform an EU/RU (or start NN in general), then it thinks that NameNode HA is still not complete. Blueprint with HA out of the box needs a step to delete these configs after the deployment is done.

BUG: https://issues.apache.org/jira/browse/AMBARI-18394

WORKAROUND:

Workaround is to remove configs "dfs_ha_initial_namenode_active" and "dfs_ha_initial_namenode_standby" using configs.sh during the middle of EU and retry the step to restart NN.

Cloudera Community

Community Articles

Blueprint deployed NameNode HA failing on restarting NameNode during EU

Apache Ambari

Apache Hadoop

Namenode Restart fails during HDP Upgrade.

Accelerating Replication and Decommissioning in HD...

How QJM Works in Namenode HA

Scaling the HDFS NameNode (part 5)

Namenode HA : Namenode enters 'SERVICE_NOT_RESPOND...

Garbage Collection Pauses in Namenode and Datanode

Performance Delays in Namenode Caused by Multiple ...

Scaling the HDFS NameNode (part 1)

Crash Investigation: NameNode failed to load FsIma...

unable to start namenode