- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 09-19-2016 11:14 PM
During the process of Express Upgrade for a cluster that was deployed using Blueprint configured to be a HDFS HA, for a cluster with versions HDP 2.4.2 and Ambari 2.2.2, the NN failed to restart as shown below.
13debf893a605e8a88df18a7d8d214f571e05289; compiled by 'jenkins' on 2016-04-25T05:46Z\nSTARTUP_MSG: java = 1.8.0_60\n************************************************************/\n16/09/14 18:40:33 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]\n16/09/14 18:40:33 INFO namenode.NameNode: createNameNode [-bootstrapStandby, -nonInteractive]\n16/09/14 18:40:34 WARN common.Util: Path /hadoopfs/fs1/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.\n16/09/14 18:40:34 WARN common.Util: Path /hadoopfs/fs1/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.\n16/09/14 18:40:35 INFO ipc.Client: Retrying connect to server: <IP ADDRESS>:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
RCA:
Problem is that during a Blueprint deployment by Cloudbreak, it keeps the following 2 configs in hadoop-env
dfs_ha_initial_namenode_active dfs_ha_initial_namenode_standby
When it's time to perform an EU/RU (or start NN in general), then it thinks that NameNode HA is still not complete. Blueprint with HA out of the box needs a step to delete these configs after the deployment is done.
BUG: https://issues.apache.org/jira/browse/AMBARI-18394
WORKAROUND:
Workaround is to remove configs "dfs_ha_initial_namenode_active" and "dfs_ha_initial_namenode_standby" using configs.sh during the middle of EU and retry the step to restart NN.