Created 11-12-2018 02:20 PM
After enablling kerberos HBase master failing to restart. On investigation found that in zookeeper /hbase-secure/master node is missing.
Zookeeper shows following Acl
[zk: localhost:2181(CONNECTED) 1] getAcl /hbase-secure 'world,'anyone : r 'sasl,'hbase : cdrwa 'sasl,'hbase : cdrwa
Tried all the threads eg.
https://community.hortonworks.com/articles/82405/how-to-remove-acl-protected-zk-node.html
Any ideas?
Created 11-15-2018 12:35 PM
Solution for the desparate souls like me.
copied atlas-application.properties from /etc/atlas/conf to /etc/hbase/conf changed permission to 744 and ownership to atlas:hadoop
Restart Hbase and Atlas.
Don't know if its the right thing but worked!!! Start to laugh again !!!.
Created 11-14-2018 01:59 PM
Hi Mujeeb,
Could you please provide more details, like error that you are receiving, part of the logs, what do you mean with "found that in zookeeper /hbase-secure/master node is missing"
Please elaborate.
Regards,
AQ
Created 11-14-2018 04:31 PM
2018-11-14 16:15:03,009 WARN [master/hdata4:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2018-11-14 16:15:03,009 INFO [master/hdata4:16000] assignment.AssignmentManager: Stopping assignment manager
2018-11-14 16:15:03,032 WARN [master/hdata4:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions.
2018-11-14 16:15:03,033 INFO [master/hdata4:16000] procedure2.RemoteProcedureDispatcher: Stopping procedure remote dispatcher
2018-11-14 16:15:03,033 INFO [master/hdata4:16000] procedure2.ProcedureExecutor: Stopping
2018-11-14 16:15:03,036 INFO [master/hdata4:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store, isAbort=false
2018-11-14 16:15:03,070 ERROR [master/hdata4:16000] wal.WALProcedureStore: Unable to close the stream
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Client (=DFSClient_NONMAPREDUCE_-826562846_1) is not the lease owner (=DFSClient_NONMAPREDUCE_-1195801889_1: /apps/hbase/data/MasterProcWALs/pv2-00000000000000000011.log (inode 10368947) Holder DFSClient_NONMAPREDUCE_-826562846_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2837)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:685)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2858)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
at org.apache.hadoop.ipc.Client.call(Client.java:1443)
at org.apache.hadoop.ipc.Client.call(Client.java:1353)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy18.complete(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:550)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy19.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.complete(Unknown Source)
2018-11-14 16:15:03,071 INFO [master/hdata4:16000] hbase.ChoreService: Chore service for: master/hdata4:16000.splitLogManager. had [] on shutdown
2018-11-14 16:15:03,071 INFO [master/hdata4:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2018-11-14 16:15:03,071 ERROR [master/hdata4:16000] access.TableAuthManager: Something wrong with the TableAuthManager reference counting: org.apache.hadoop.hbase.security.access.TableAuthManager@7e83992 whose count is null
Zookeeper znode listing for unsecure and secure. master znode is missing from hbase-secure
[zk: hdata2.local:2181(CONNECTED) 0] ls /hbase-unsecure [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, master, switch, running, draining, namespace, hbaseid, table] [zk: hdata2.local:2181(CONNECTED) 0] ls /hbase-secure [replication, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, switch, running, tokenauth, draining, hbaseid, table]
Created 11-14-2018 05:29 PM
Some more logs
atlas TABLE Took 8.2656 secondsjava exception ERROR Java::OrgApacheZookeeper::KeeperException::NoNodeException: KeeperErrorCode = NoNode for /hbase-secure/master 2018-11-14 17:27:34,307 - Retrying after 10 seconds. Reason: Execution of 'kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase-003@IIM.LOCAL; cat /var/lib/ambari-agent/tmp/atlas_hbase_setup.rb | hbase shell -n' returned 1. atlas_janus ATLAS_ENTITY_AUDIT_EVENTS atlas TABLE Took 8.2337 secondsjava exception
ERROR Java::OrgApacheZookeeper::KeeperException::NoNodeException: KeeperErrorCode = NoNode for /hbase-secure/master
Created 11-15-2018 12:35 PM
Solution for the desparate souls like me.
copied atlas-application.properties from /etc/atlas/conf to /etc/hbase/conf changed permission to 744 and ownership to atlas:hadoop
Restart Hbase and Atlas.
Don't know if its the right thing but worked!!! Start to laugh again !!!.