Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Super Guru

SYMPTOM: While following steps for HDP pre-upgrade activity the active namenode went down while issuing - #hdfs dfsadmin -savenamespace" command.

Below was the error -

================================================ 
hdfs@namenode1~$ hdfs dfsadmin -saveNamespace 
saveNamespace: Call From namenode1.example.com/10.160.81.30 to namenode1.example.com:8020 
failed on connection exception: java.net.ConnectException: Connection refused; For more details see: 
http://wiki.apache.org/hadoop/ConnectionRefused
================================================

ERROR:

2016-06-14 02:18:49,774 WARN  ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

2016-06-14 02:18:51,774 INFO  ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

2016-06-14 02:18:51,775 WARN  ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

2016-06-14 02:18:53,776 INFO  ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

2016-06-14 02:18:53,777 WARN  ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

2016-06-14 02:18:55,778 INFO  ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

2016-06-14 02:18:55,778 WARN  ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

OR

ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Swallowing exception in NameNodeEditLogRoller:
java.lang.IllegalStateException: Bad state: BETWEEN_LOG_SEGMENTS
        at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getCurSegmentTxId(FSEditLog.java:493)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeEditLogRoller.run(FSNamesystem.java:4358)
        at java.lang.Thread.run(Thread.java:745)

ROOT CAUSE: This is a BUG https://issues.apache.org/jira/browse/HDFS-7871 and it has been fixed in HDP 2.2.9 and HDP 2.4.

RESOLUTION: Upgrading to HDP 2.4.0.0-169 resolved the issue.

1,648 Views