Created 11-12-2018 02:20 PM
After enablling kerberos HBase master failing to restart. On investigation found that in zookeeper /hbase-secure/master node is missing.
Zookeeper shows following Acl
[zk: localhost:2181(CONNECTED) 1] getAcl /hbase-secure 'world,'anyone : r 'sasl,'hbase : cdrwa 'sasl,'hbase : cdrwa
Tried all the threads eg.
https://community.hortonworks.com/articles/82405/how-to-remove-acl-protected-zk-node.html
Any ideas?
Created 11-15-2018 12:35 PM
Solution for the desparate souls like me.
copied atlas-application.properties from /etc/atlas/conf to /etc/hbase/conf changed permission to 744 and ownership to atlas:hadoop
Restart Hbase and Atlas.
Don't know if its the right thing but worked!!! Start to laugh again !!!.
Created 11-14-2018 01:59 PM
Hi Mujeeb,
Could you please provide more details, like error that you are receiving, part of the logs, what do you mean with "found that in zookeeper /hbase-secure/master node is missing"
Please elaborate.
Regards,
AQ
Created 11-14-2018 04:31 PM
2018-11-14 16:15:03,009 WARN [master/hdata4:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null 2018-11-14 16:15:03,009 INFO [master/hdata4:16000] assignment.AssignmentManager: Stopping assignment manager 2018-11-14 16:15:03,032 WARN [master/hdata4:16000] assignment.AssignmentManager: No servers available; cannot place 1 unassigned regions. 2018-11-14 16:15:03,033 INFO [master/hdata4:16000] procedure2.RemoteProcedureDispatcher: Stopping procedure remote dispatcher 2018-11-14 16:15:03,033 INFO [master/hdata4:16000] procedure2.ProcedureExecutor: Stopping 2018-11-14 16:15:03,036 INFO [master/hdata4:16000] wal.WALProcedureStore: Stopping the WAL Procedure Store, isAbort=false 2018-11-14 16:15:03,070 ERROR [master/hdata4:16000] wal.WALProcedureStore: Unable to close the stream org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Client (=DFSClient_NONMAPREDUCE_-826562846_1) is not the lease owner (=DFSClient_NONMAPREDUCE_-1195801889_1: /apps/hbase/data/MasterProcWALs/pv2-00000000000000000011.log (inode 10368947) Holder DFSClient_NONMAPREDUCE_-826562846_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2837) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:685) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:671) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2858) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:928) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:607) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497) at org.apache.hadoop.ipc.Client.call(Client.java:1443) at org.apache.hadoop.ipc.Client.call(Client.java:1353) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy18.complete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:550) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy19.complete(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372) at com.sun.proxy.$Proxy20.complete(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372) at com.sun.proxy.$Proxy20.complete(Unknown Source) 2018-11-14 16:15:03,071 INFO [master/hdata4:16000] hbase.ChoreService: Chore service for: master/hdata4:16000.splitLogManager. had [] on shutdown 2018-11-14 16:15:03,071 INFO [master/hdata4:16000] flush.MasterFlushTableProcedureManager: stop: server shutting down. 2018-11-14 16:15:03,071 ERROR [master/hdata4:16000] access.TableAuthManager: Something wrong with the TableAuthManager reference counting: org.apache.hadoop.hbase.security.access.TableAuthManager@7e83992 whose count is null
Zookeeper znode listing for unsecure and secure. master znode is missing from hbase-secure
[zk: hdata2.local:2181(CONNECTED) 0] ls /hbase-unsecure [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, master, switch, running, draining, namespace, hbaseid, table] [zk: hdata2.local:2181(CONNECTED) 0] ls /hbase-secure [replication, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, online-snapshot, switch, running, tokenauth, draining, hbaseid, table]
Created 11-14-2018 05:29 PM
Some more logs
atlas TABLE Took 8.2656 secondsjava exception ERROR Java::OrgApacheZookeeper::KeeperException::NoNodeException: KeeperErrorCode = NoNode for /hbase-secure/master 2018-11-14 17:27:34,307 - Retrying after 10 seconds. Reason: Execution of 'kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase-003@IIM.LOCAL; cat /var/lib/ambari-agent/tmp/atlas_hbase_setup.rb | hbase shell -n' returned 1. atlas_janus ATLAS_ENTITY_AUDIT_EVENTS atlas TABLE Took 8.2337 secondsjava exception
ERROR Java::OrgApacheZookeeper::KeeperException::NoNodeException: KeeperErrorCode = NoNode for /hbase-secure/master
Created 11-15-2018 12:35 PM
Solution for the desparate souls like me.
copied atlas-application.properties from /etc/atlas/conf to /etc/hbase/conf changed permission to 744 and ownership to atlas:hadoop
Restart Hbase and Atlas.
Don't know if its the right thing but worked!!! Start to laugh again !!!.