Created on 10-14-2021 12:39 AM - last edited on 10-14-2021 06:44 AM by cjervis
Hi Team, client production cluster 3 Weeks back suddenly down.That time they started the cluster services all services are came up but roll change happened and Standby name node is active and cluster is up and running.while starting the other name node(previous active name node) not coming up and strucked at "RPC wait" and in CM for that namenode status showing as 'BUSY'.if any one help me on this issue
Created 10-14-2021 11:15 AM
Can you please check zookeeper logs once and share if any errors in the zookeeper logs
Created 10-14-2021 12:42 AM
Standby NN logfile:
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:322)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.UnsupportedOperationException: CollectionUsage threshold is not supported
at sun.management.MemoryPoolImpl.getCollectionUsageThreshold(MemoryPoolImpl.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
.
.
.
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.UnsupportedOperationException: Usage threshold is not supported
at sun.management.MemoryPoolImpl.getUsageThresholdCount(MemoryPoolImpl.java:189)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Created 10-14-2021 12:43 AM
FAILOVERCONTROLLER logfile of Standby NN(previous ACTIVE NN):
2021-10-13 04:35:26,628 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at passive-mns-vm1/10.4.185.90:8022: java.net.ConnectException: Connection refused Call From passive-mns-vm1/10.4.185.90 to passive-mns-vm1:8022 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2021-10-13 04:35:28,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: passive-mns-vm1/10.4.185.90:8022. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2021-10-13 04:35:28,629 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at passive-mns-vm1/10.4.185.90:8022: java.net.ConnectException: Connection refused Call From passive-mns-vm1/10.4.185.90 to passive-mns-vm1:8022 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Created 10-14-2021 12:44 AM
Could you please help me on this issue??
Created 10-14-2021 11:15 AM
Can you please check zookeeper logs once and share if any errors in the zookeeper logs