Support Questions

Find answers, ask questions, and share your expertise

both namenodes are standby! how can I active one of them?

avatar
Explorer

It seems like the election of the (active)namenode did not correctly take place.

(Hortonworks /ambari 2.4.0.1/ High-availability)

12 REPLIES 12

avatar

Hi @Jessika314 ninja,

Can you confirm that ZooKeeper is OK and running? Could you check the logs? Did you try restarting HDFS service through Ambari?

Thanks.

avatar
Explorer

I have done all that but nothing helps

avatar

Could you share the logs you have for both namenodes?

avatar
Explorer

NAMENODE1

root. Superuser privilege is required 2016-10-24 11:25:52,065 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(271)) - Triggering log roll on remote NameNode hdpmaste r2/192.168.1.162:8020 2016-10-24 11:25:52,108 WARN ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(276)) - Unable to trigger a roll of the active NN org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1932) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6029) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1219) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)

at org.apache.hadoop.ipc.Client.call(Client.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy21.rollEditLog(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:273) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:315) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:449) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) 2016-10-24 11:25:52,400 INFO ipc.Server (Server.java:logException(2287)) - IPC Server handler 19 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 192.168.1.161:35816 Call#4509 Retry#0: org.apache.hadoop.security.AccessControlException: Access denied for user root. Superuser privilege is required :.......................

rotocol.getServiceStatus from 192.168.1.161:35837 Call#4526 Retry#0: org.apache.hadoop.security.AccessControlException: Access denied for user root. Superuser privilege is required 2016-10-24 11:26:12,138 WARN namenode.FSEditLog (JournalSet.java:selectInputStreams(280)) - Unable to determine input streams from QJM to [192 .168.1.161:8485, 192.168.1.162:8485, 192.168.1.163:8485]. Skipping. java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471) at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1507) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1531) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:214) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:449) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)

avatar
Super Guru
@Jessika314 ninja

Stop both namenodes. If we are able to recall which was the active namenode last time then just start that namenode and let the other be down.

Make sure ZKFC and JN's should be running when you do this activity.

Once the namenode is started, check if its showing Active status in ambari UI. If YES, wait for namenode to come outof safemode.

If the namenode doesnot show active please check the ZKFC and Namenode logs.

Once the namenode is out of safe mode start the other namenode and check the status in Ambari UI.

Let me know if that helps.

avatar
Explorer

I face problems during starting the zkFailoverControllers, it turns donw automatically after being started for few seconds.

avatar
Contributor

Hi @Jessika314_ninja,

It looks like the NameNodes cannot contact the Journal Nodes - can you check the JNs are running.

You say the zkfc stops after starting it - can you share some of the last 2-500 lines of log from it?

Thanks

Dave

avatar
Explorer

the JNs are running.

here is the ZKFC log

Refused 2016-10-24 15:10:43,506 INFO ipc.Client (Client.java:handleConnectionFailure(874)) - Retrying connect to server: hdpmaster1/192.168.1.161:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2016-10-24 15:10:43,506 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(211)) - Transport-level exception trying to monitor health of NameNode at hdpmaster1/192.168.1.161:8020: java.net.ConnectException: Connection refused Call From hdpmaster1/192.168.1.161 to hdpmaster1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

avatar
Contributor

Ok this is due to NN not being active.

Did you start your NameNode as user HDFS? What are the permissions on the journal node directories?

AccessControlException: Access denied for user root. Superuser privilege is required 2016-10-24 11:26:12,138 WARN namenode.FSEditLog (JournalSet.java:selectInputStreams(280)) - Unable to determine input streams from QJM to [192 .168.1.161:8485, 192.168.1.162:8485, 192.168.1.163:8485].

Can you make sure your journalnodes are runnign as hdfs and also your namenodes?