Support Questions
Find answers, ask questions, and share your expertise

NameNode fails over to standby every night

NameNode fails over to standby every night ... Any idea whats happening here:


Logs:

2019-05-01 00:07:14,977 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:07:17,766 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:07:17,775 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:07:41,037 INFO ha.EditLogTailer (EditLogTailer.java:triggerActiveLogRoll(323)) - Triggering log roll on remote NameNode

2019-05-01 00:07:41,226 INFO namenode.FSImage (FSImage.java:loadEdits(835)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@721c1c9c expecting start txid #77470462

2019-05-01 00:07:41,226 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(142)) - Start loading edits file http://xxxxxxxx02.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63..., http://xxxxxxxx01.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63...

2019-05-01 00:07:41,226 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://xxxxxxxx02.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63%3A222874268%3A0%3ACID-efc23aa1-55ef-44b5-85f0-c561146ad55b, http://xxxxxxxx01.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63...; to transaction ID 77470462

2019-05-01 00:07:41,226 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://xxxxxxxx02.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63%3A222874268%3A0%3ACID-efc23aa1-55ef-44b5-85f0-c561146ad55b' to transaction ID 77470462

2019-05-01 00:07:41,253 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(145)) - Edits file http://xxxxxxxx02.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63..., http://xxxxxxxx01.admin.xxxx.co.uk:8480/getJournal?jid=NBSHDP01&segmentTxId=77470462&storageInfo=-63... of size 16778 edits # 108 loaded in 0 seconds

2019-05-01 00:07:41,253 INFO ha.EditLogTailer (EditLogTailer.java:doTailEdits(275)) - Loaded 108 edits starting from txid 77470461

2019-05-01 00:08:11,831 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:08:11,841 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:08:14,643 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:08:14,652 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for SVCIMASTRANSFER@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:08:32,048 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for infa@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:08:32,049 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for svcimastransfer (auth:PROXY) via infa@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:08:35,476 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for infa@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:08:35,476 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for svcimastransfer (auth:PROXY) via infa@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:08:35,559 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for infa@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:08:35,560 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for svcimastransfer (auth:PROXY) via infa@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:08:35,648 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for infa@ADMIN.xxxx.CO.UK (auth:KERBEROS)

2019-05-01 00:08:35,648 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for svcimastransfer (auth:PROXY) via infa@ADMIN.xxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 00:08:35,777 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for infa@ADMIN.xxxx.CO.UK (auth:KERBEROS)


At present Log spits:

2019-05-01 08:31:53,375 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for hive/xxxxxxxxxx.admin.xxxxx.co.uk@ADMIN.xxxxx.CO.UK (auth:KERBEROS)

2019-05-01 08:31:53,375 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for SVCIMASTRANSFER@ADMIN.xxxxx.CO.UK (auth:PROXY) via hive/xxxxxxxxxx.admin.xxxxx.co.uk@ADMIN.xxxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 08:31:53,405 WARN ipc.Server (Server.java:saslProcess(1557)) - Auth failed for 39.7.48.5:56352:null (DIGEST-MD5: IO error acquiring password) with true cause: (Operation category READ is not supported in state standby)

2019-05-01 08:31:53,405 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client 39.7.48.5 threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby]

2019-05-01 08:31:53,409 WARN ipc.Server (Server.java:saslProcess(1557)) - Auth failed for 39.7.48.5:56354:null (DIGEST-MD5: IO error acquiring password) with true cause: (Operation category READ is not supported in state standby)

2019-05-01 08:31:53,409 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client 39.7.48.5 threw exception [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby]

2019-05-01 08:31:53,446 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for hive/xxxxxxxx02.admin.xxxxx.co.uk@ADMIN.xxxxx.CO.UK (auth:KERBEROS)

2019-05-01 08:31:53,447 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(137)) - Authorization successful for SVCIMASTRANSFER@ADMIN.xxxxx.CO.UK (auth:PROXY) via hive/xxxxxxxx02.admin.xxxxx.co.uk@ADMIN.xxxxx.CO.UK (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol

2019-05-01 08:31:53,478 WARN ipc.Server (Server.java:saslProcess(1557)) - Auth failed for 39.7.48.19:40278:null (DIGEST-MD5: IO error acquiring password) with true cause: (Operation category READ is not supported in state standby)

1 REPLY 1

Mentor

@PK

The name node failover looks a normal process and the edits files are being applied correctly. The KDC looks to be performing also without any Kerberos errors that leaves one culprit!

Your application client IP 39.7.48.5 is misconfigured to connect to a specific Namenode so when the failover happens boooouum error [org.apache.hadoop.ipc.StandbyException: Operation category READ is not supported in state standby] your client should be configured to use the namespace which works as a DNS for the Namenode rather than hard coding a namenode !!


Can you validate my suspicion?

HTH