Support Questions
Find answers, ask questions, and share your expertise

Unable to start HBase master after enabling kerberos

Explorer

After enablling kerberos HBase master failing to restart. On investigation found that in zookeeper /hbase-secure/master node is missing.

Zookeeper shows following Acl

[zk: localhost:2181(CONNECTED) 1] getAcl /hbase-secure 'world,'anyone
: r 'sasl,'hbase
: cdrwa 'sasl,'hbase
: cdrwa

Tried all the threads eg.

https://community.hortonworks.com/content/supportkb/151088/how-to-force-remove-znode-with-stale-acl....

https://community.hortonworks.com/articles/82405/how-to-remove-acl-protected-zk-node.html

Any ideas?

1 ACCEPTED SOLUTION

Explorer

Solution for the desparate souls like me.

copied atlas-application.properties from /etc/atlas/conf to /etc/hbase/conf changed permission to 744 and ownership to atlas:hadoop

Restart Hbase and Atlas.

Don't know if its the right thing but worked!!! Start to laugh again !!!.

View solution in original post

4 REPLIES 4

Cloudera Employee

Hi Mujeeb,

Could you please provide more details, like error that you are receiving, part of the logs, what do you mean with "found that in zookeeper /hbase-secure/master node is missing"

Please elaborate.

Regards,

AQ

Explorer
2018-11-14 16:15:03,009 WARN  [master/hdata4:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2018-11-14 16:15:03,009 INFO  [master/hdata4:16000] assignment.AssignmentManager: Stopping assignment manager
2018-11-14 16:15:03,032 WARN  [master/hdata4:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.
2018-11-14 16:15:03,033 INFO  [master/hdata4:16000] procedure2.RemoteProcedureDispatcher: Stopping procedure remote dispatcher
2018-11-14 16:15:03,033 INFO  [master/hdata4:16000] procedure2.ProcedureExecutor: Stopping
2018-11-14 16:15:03,036 INFO  [master/hdata4:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store, isAbort=false
2018-11-14 16:15:03,070 ERROR [master/hdata4:16000] wal.WALProcedureStore: Unable to close the stream
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Client (=DFSClient_NONMAPREDUCE_-826562846_1) is not the lease owner (=DFSClient_NONMAPREDUCE_-1195801889_1: /apps/hbase/data/MasterProcWALs/pv2-00000000000000000011.log (inode 10368947) Holder DFSClient_NONMAPREDUCE_-826562846_1 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2837)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:685)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2858)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)


        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
        at org.apache.hadoop.ipc.Client.call(Client.java:1443)
        at org.apache.hadoop.ipc.Client.call(Client.java:1353)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy18.complete(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:550)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy19.complete(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
        at com.sun.proxy.$Proxy20.complete(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
        at com.sun.proxy.$Proxy20.complete(Unknown Source)

2018-11-14 16:15:03,071 INFO  [master/hdata4:16000] hbase.ChoreService: Chore service for: master/hdata4:16000.splitLogManager. had [] on shutdown
2018-11-14 16:15:03,071 INFO  [master/hdata4:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2018-11-14 16:15:03,071 ERROR [master/hdata4:16000] access.TableAuthManager: Something wrong with the TableAuthManager reference counting: org.apache.hadoop.hbase.security.access.TableAuthManager@7e83992 whose count is null

Zookeeper znode listing for unsecure and secure. master znode is missing from hbase-secure

[zk: hdata2.local:2181(CONNECTED) 0] ls /hbase-unsecure
[replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, master, switch, running, draining, namespace, hbaseid, table]

[zk: hdata2.local:2181(CONNECTED) 0] ls /hbase-secure [replication, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, switch, running, tokenauth, draining, hbaseid, table]

Explorer

Some more logs

atlas
TABLE
Took 8.2656 secondsjava exception
ERROR Java::OrgApacheZookeeper::KeeperException::NoNodeException: KeeperErrorCode = NoNode for /hbase-secure/master
2018-11-14 17:27:34,307 - Retrying after 10 seconds. Reason: Execution of 'kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase-003@IIM.LOCAL; cat /var/lib/ambari-agent/tmp/atlas_hbase_setup.rb | hbase shell -n' returned 1. atlas_janus
ATLAS_ENTITY_AUDIT_EVENTS
atlas
TABLE
Took 8.2337 secondsjava exception 

ERROR Java::OrgApacheZookeeper::KeeperException::NoNodeException: KeeperErrorCode = NoNode for /hbase-secure/master

Explorer

Solution for the desparate souls like me.

copied atlas-application.properties from /etc/atlas/conf to /etc/hbase/conf changed permission to 744 and ownership to atlas:hadoop

Restart Hbase and Atlas.

Don't know if its the right thing but worked!!! Start to laugh again !!!.

; ;