We are using zookeeper-3.4.6.jar and hbase-client-1.4.5.jar for connecting to hbase cluster in a kerberized environment. All works fine except once in a while we get the following error.
ERROR ZooKeeperSaslClient:384 % An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection refused (Connectionrefused))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state
ERROR ClientCnxn:1015 % SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection refused (Connection refused))]) occurred when evaluating ZookeeperQuorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state
My concern is above error is not thrown to application level but only logged by the library.. So we can't do a retry in application level. In a case like above how can we do a retry for eg: create Connection again?
And after getting above error application goes to a state where below exception is logged continuously.
2022-02-21 10:00:43 WARN ZKUtil:637 % hconnection-0x55182842-0x47f15b1a977331e, quorum=cmvp9k0e.prd.cm.par.emea.cib:2181,cmvp9k0h.\
prd.cm.par.emea.cib:2181,cmvp9k0i.prd.cm.par.emea.cib:2181,cmvp9k0j.prd.cm.par.emea.cib:2181,cmvp9k0k.prd.cm.par.emea.cib:2181, bas\
eZNode=/hbase-secure Unable to get data of znode /hbase-secure/meta-region-server
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase-secure/meta-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:607)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:588)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:561)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1254)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1221)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1327)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1224)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at com.test.rkp.model.dao.EmptySchedulerRunDao.filterLatest(EmptySchedulerRunDao.java:45)
at com.test.rkp.runnable.EmptyReportSchedulerHelper.getLastSavedRunTime(EmptyReportSchedulerHelper.java:60)
at com.test.rkp.runnable.EmptyReportScheduler.main(EmptyReportScheduler.java:86)
2022-02-21 10:00:43 ERROR ZooKeeperWatcher:734 % hconnection-0x55182842-0x47f15b1a977331e, quorum=cmvp9k0e.prd.cm.par.emea.cib:2181\
,cmvp9k0h.prd.cm.par.emea.cib:2181,cmvp9k0i.prd.cm.par.emea.cib:2181,cmvp9k0j.prd.cm.par.emea.cib:2181,cmvp9k0k.prd.cm.par.emea.cib\
:2181, baseZNode=/hbase-secure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase-secure/meta-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:607)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:588)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:561)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1254)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1221)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1327)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1224)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at com.test.rkp.model.dao.EmptySchedulerRunDao.filterLatest(EmptySchedulerRunDao.java:45)
at com.test.rkp.runnable.EmptyReportSchedulerHelper.getLastSavedRunTime(EmptyReportSchedulerHelper.java:60)
at com.test.rkp.runnable.EmptyReportScheduler.main(EmptyReportScheduler.java:86)
Created 03-01-2022 07:09 PM
Are you specifying a jaas.conf file for your application, through the java.security.auth.login.config property?
If so, could you please provide the configuration in it?
If not, create a file with your authentication details (principal and keytab) like the example below, save in a location that the application has access to and pass that property to the application using -Djava.security.auth.login.config=/path/to/jaas.conf
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/path/to/appuser.keytab"
storeKey=true
useTicketCache=false
principal="appuser@YOUR-REALM";
};
Cheers,
André
Created 03-01-2022 07:38 PM
hi @araujo,
yes we are providing the jaas.conf
Client {
com.sun.security.auth.module.Krb5LoginModule required
debug = true
doNotPrompt = true
storeKey = true
useKeyTab = true
useTicketCache = false
principal = "tst@BK.DFN"
keyTab = "/etc/security/keytabs/tst.keytab"
serviceName = "zookeeper"
}
I don't think the issue is related to jaas.conf since application is working fine for most of the time. But we get the error below like in twice a month. After the below error application is not recovering.
Thanks.
2022-02-21 10:00:03 ERROR ZooKeeperSaslClient:384 % An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslExc\
eption: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection refused (Connection\
refused))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state\
.
2022-02-21 10:00:03 ERROR ClientCnxn:1015 % SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslExcept\
ion: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSExcep\
tion: No valid credentials provided (Mechanism level: Connection refused (Connection refused))]) occurred when evaluating Zookeeper\
Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
Created 03-01-2022 08:49 PM
@Dilan86 ,
If the application is not being affected and you only see this a few times a month, I wouldn't worry about it. It might be due to some intermittent issue, like network connectivity glitches, for example.
Unless it's hurting I would just ignore it.
André
Created 03-01-2022 09:06 PM
Hi @araujo ,
The problem is after getting the error our application is not recovering. Do you have any suggestions on catching the below error in application and doing a retrying for connection?
Thanks
2022-02-21 10:00:03 ERROR ClientCnxn:1015 % SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslExcept\
ion: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSExcep\
tion: No valid credentials provided (Mechanism level: Connection refused (Connection refused))]) occurred when evaluating Zookeeper\
Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
Created 03-01-2022 09:19 PM
@Dilan86 ,
You can try enabling Kerberos debug log and waiting for it to happen again:
-Dsun.security.krb5.debug=true
What kind of applications is this?
Is it a long running application?
Have you noticed any patterns like "the application fails after running for x hours/days"?
André