Created 10-24-2016 10:11 AM
It seems like the election of the (active)namenode did not correctly take place.
(Hortonworks /ambari 2.4.0.1/ High-availability)
Created 10-24-2016 10:14 AM
Can you confirm that ZooKeeper is OK and running? Could you check the logs? Did you try restarting HDFS service through Ambari?
Thanks.
Created 10-24-2016 10:15 AM
I have done all that but nothing helps
Created 10-24-2016 10:17 AM
Could you share the logs you have for both namenodes?
Created 10-24-2016 10:38 AM
NAMENODE1
root. Superuser privilege is required 2016-10-24 11:25:52,065 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(271)) - Triggering log roll on remote NameNode hdpmaste r2/192.168.1.162:8020 2016-10-24 11:25:52,108 WARN ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(276)) - Unable to trigger a roll of the active NN org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1932) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6029) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1219) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
at org.apache.hadoop.ipc.Client.call(Client.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy21.rollEditLog(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:273) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:315) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:449) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297) 2016-10-24 11:25:52,400 INFO ipc.Server (Server.java:logException(2287)) - IPC Server handler 19 on 8020, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 192.168.1.161:35816 Call#4509 Retry#0: org.apache.hadoop.security.AccessControlException: Access denied for user root. Superuser privilege is required :.......................
rotocol.getServiceStatus from 192.168.1.161:35837 Call#4526 Retry#0: org.apache.hadoop.security.AccessControlException: Access denied for user root. Superuser privilege is required 2016-10-24 11:26:12,138 WARN namenode.FSEditLog (JournalSet.java:selectInputStreams(280)) - Unable to determine input streams from QJM to [192 .168.1.161:8485, 192.168.1.162:8485, 192.168.1.163:8485]. Skipping. java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471) at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1507) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1531) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:214) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:449) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)
Created 10-24-2016 12:53 PM
Stop both namenodes. If we are able to recall which was the active namenode last time then just start that namenode and let the other be down.
Make sure ZKFC and JN's should be running when you do this activity.
Once the namenode is started, check if its showing Active status in ambari UI. If YES, wait for namenode to come outof safemode.
If the namenode doesnot show active please check the ZKFC and Namenode logs.
Once the namenode is out of safe mode start the other namenode and check the status in Ambari UI.
Let me know if that helps.
Created 10-24-2016 01:34 PM
I face problems during starting the zkFailoverControllers, it turns donw automatically after being started for few seconds.
Created 10-24-2016 01:54 PM
Hi @Jessika314_ninja,
It looks like the NameNodes cannot contact the Journal Nodes - can you check the JNs are running.
You say the zkfc stops after starting it - can you share some of the last 2-500 lines of log from it?
Thanks
Dave
Created 10-24-2016 02:16 PM
the JNs are running.
here is the ZKFC log
Refused 2016-10-24 15:10:43,506 INFO ipc.Client (Client.java:handleConnectionFailure(874)) - Retrying connect to server: hdpmaster1/192.168.1.161:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2016-10-24 15:10:43,506 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(211)) - Transport-level exception trying to monitor health of NameNode at hdpmaster1/192.168.1.161:8020: java.net.ConnectException: Connection refused Call From hdpmaster1/192.168.1.161 to hdpmaster1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Created 10-24-2016 03:03 PM
Ok this is due to NN not being active.
Did you start your NameNode as user HDFS? What are the permissions on the journal node directories?
AccessControlException: Access denied for user root. Superuser privilege is required 2016-10-24 11:26:12,138 WARN namenode.FSEditLog (JournalSet.java:selectInputStreams(280)) - Unable to determine input streams from QJM to [192 .168.1.161:8485, 192.168.1.162:8485, 192.168.1.163:8485].
Can you make sure your journalnodes are runnign as hdfs and also your namenodes?