Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Active Name Node not coming up

avatar
Expert Contributor

Hi,

 

I had to reboot one of my data node. While rebooting I came to know the data node that I am rebooting also acts as Journal node.

 

The cluster is kerberized and HA enabled. After reboot, both Namenodes are coming as Standby. I tried all rebooting methods, no luck..

 

Here are the zookeeper Logs:

2020-02-07 20:22:47,823 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(211)) - Transport-level exception trying to monitor health of NameNode at admin1.XXXX.io/XX.4.48.11:8020: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/XX.XX.48.11:43263 remote=admin1.XXXX.io/XX.XX.48.11:8020] Call From admin1.XXXX.io/XX.XX.48.11 to admin1.XXXX.io:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/XX.XX.48.11:43263 remote=admin1.XXXX.io/XX.XX.48.11:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
2020-02-07 20:22:47,823 INFO ha.HealthMonitor (HealthMonitor.java:enterState(249)) - Entering state SERVICE_NOT_RESPONDING
2020-02-07 20:23:01,410 FATAL ha.ZKFailoverController (ZKFailoverController.java:becomeActive(401)) - Couldn't make NameNode at admin1.XXXX.io/XX.XX.48.11:8020 active
java.io.EOFException: End of File Exception between local host is: "admin1.XXXX.io/XX.XX.48.11"; destination host is: "admin1.XXXX.io":8020; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy9.transitionToActive(Unknown Source)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100)
at org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48)
at org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:390)
at org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:61)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:880)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:864)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:468)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1119)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1014)
2020-02-07 20:23:01,411 WARN ha.ActiveStandbyElector (ActiveStandbyElector.java:becomeActive(868)) - Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: Couldn't transition to active

 

Name node Logs:

***********************

 

2020-02-08 06:18:55,184 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.11:34267 for protocol org.apache.hadoop.ha.HAServiceProtocol is unauthorized for user nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP (auth:PROXY) via $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:55,185 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client XX.XX.48.11 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP is not allowed to impersonate nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP]
2020-02-08 06:18:56,190 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:56,191 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.11:41305 for protocol org.apache.hadoop.ha.HAServiceProtocol is unauthorized for user nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP (auth:PROXY) via $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:56,191 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client XX.XX.48.11 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP is not allowed to impersonate nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP]
2020-02-08 06:18:57,197 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:57,198 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.11:44308 for protocol org.apache.hadoop.ha.HAServiceProtocol is unauthorized for user nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP (auth:PROXY) via $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:57,198 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client XX.XX.48.11 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP is not allowed to impersonate nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP]
2020-02-08 06:18:58,204 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:58,205 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.11:40352 for protocol org.apache.hadoop.ha.HAServiceProtocol is unauthorized for user nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP (auth:PROXY) via $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:58,205 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client XX.XX.48.11 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP is not allowed to impersonate nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP]
2020-02-08 06:18:59,211 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:59,211 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.11:39431 for protocol org.apache.hadoop.ha.HAServiceProtocol is unauthorized for user nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP (auth:PROXY) via $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:18:59,212 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8020: readAndProcess from client XX.XX.48.11 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $FC8300-R8HGK424M6OR@XXXXXXXXXXX.CORP is not allowed to impersonate nn/admin1.XXXXXXXXX.io@XXXXXXXXXXX.CORP]

 

 

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Alright....all good now...the problem was with AD....in our Environment..KDC is AD..in AD there are 2 field names "User logon name" and "user logon name(pre windows 2000)" . Usually the value of these attributes are same..In this case, all the user names were generated automatically when we kerberize the cluster..for these user names "user logon name" and "user logon name(pre windows 2000)" were different. The "user logon name(pre windows 2000" was an 20 character alphanumeric. IN kerberized cluster, the service accounts has to impersonate all Hadoop service accounts like "nn', "dn","rm". So we edited all the service accounts in AD i,e "user logon name(pre windows 2000)" were made to be same as "User logon name" . IN HDFS config...there is a property "Auth_to_Local mappings". We added rules to convert the pattern(service account name in AD) to local service users (hdfs, nn, hive, dn ..etc etc)

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

2020-02-08 06:24:59,879 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for $7D8300-H79FE35P680K@XXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 06:24:59,880 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.17:44290 for protocol org.apache.hadoop.ha.HAServiceProtocol is unauthorized for user nn/admin2.XXXXXXXXXX.io@XXXXXXXX.CORP (auth:PROXY) via $7D8300-H79FE35P680K@XXXXXXX.CORP (auth:KERBEROS)

avatar
Expert Contributor

@Shelton from the previous posts, i see that you have dealt with these issues..Any idea ??

 

Also in the journal logs , is see the following:

************

2020-02-08 15:33:58,011 INFO server.KerberosAuthenticationHandler (KerberosAuthenticationHandler.java:init(262)) - Login using keytab /etc/security/keytabs/spnego.service.keytab, for principal HTTP/node2.prod.iad.XXXXXXXXXXX.XXXXXXXXXXX.io@XXXXXXXXXXX.CORP
2020-02-08 15:33:58,018 INFO server.KerberosAuthenticationHandler (KerberosAuthenticationHandler.java:init(281)) - Map server: node2.prod.iad.XXXXXXXXXXX.XXXXXXXXXXX.io to principal: [HTTP/node2.prod.iad.XXXXXXXXXXX.XXXXXXXXXXX.io@XXXXXXXXXXX.CORP], added = true
2020-02-08 15:33:58,034 INFO mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8480
2020-02-08 15:33:58,146 INFO ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2020-02-08 15:33:58,164 INFO ipc.Server (Server.java:run(821)) - Starting Socket Reader #1 for port 8485
2020-02-08 15:33:58,402 INFO ipc.Server (Server.java:run(1064)) - IPC Server Responder: starting
2020-02-08 15:33:58,403 INFO ipc.Server (Server.java:run(900)) - IPC Server listener on 8485: starting
2020-02-08 15:34:19,823 INFO ipc.Server (Server.java:saslProcess(1573)) - Auth successful for $7D8300-H79FE35P680K@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 15:34:19,874 INFO ipc.Server (Server.java:authorizeConnection(2235)) - Connection from XX.XX.48.17:43312 for protocol org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol is unauthorized for user nn/admin2.prod.iad.XXXXXXXXXXX.XXXXXXXXXXX.io@XXXXXXXXXXX.CORP (auth:PROXY) via $7D8300-H79FE35P680K@XXXXXXXXXXX.CORP (auth:KERBEROS)
2020-02-08 15:34:19,875 INFO ipc.Server (Server.java:doRead(1006)) - Socket Reader #1 for port 8485: readAndProcess from client XX.XX.48.17 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: User: $7D8300-H79FE35P680K@XXXXXXXXXXX.CORP is not allowed to impersonate nn/admin2.prod.iad.XXXXXXXXXXX.XXXXXXXXXXX.io@XXXXXXXXXXX.CORP]

avatar
Expert Contributor

Alright....all good now...the problem was with AD....in our Environment..KDC is AD..in AD there are 2 field names "User logon name" and "user logon name(pre windows 2000)" . Usually the value of these attributes are same..In this case, all the user names were generated automatically when we kerberize the cluster..for these user names "user logon name" and "user logon name(pre windows 2000)" were different. The "user logon name(pre windows 2000" was an 20 character alphanumeric. IN kerberized cluster, the service accounts has to impersonate all Hadoop service accounts like "nn', "dn","rm". So we edited all the service accounts in AD i,e "user logon name(pre windows 2000)" were made to be same as "User logon name" . IN HDFS config...there is a property "Auth_to_Local mappings". We added rules to convert the pattern(service account name in AD) to local service users (hdfs, nn, hive, dn ..etc etc)