Support Questions
Find answers, ask questions, and share your expertise

Continuous KeeperException and java.security.PrivilegedActionException: javax.security.sasl.SaslExc eption: GSS initiate failed

New Contributor

We are using zookeeper-3.4.6.jar and hbase-client-1.4.5.jar for connecting to hbase cluster in a kerberized environment.  All works fine except once in a while we get the following error.

 

ERROR ZooKeeperSaslClient:384 % An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection refused (Connectionrefused))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state

 


ERROR ClientCnxn:1015 % SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection refused (Connection refused))]) occurred when evaluating ZookeeperQuorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state

 

My concern is above error is not thrown to application level but only logged by the library.. So we can't do a retry in application level. In a case like above how can we do a retry for eg: create Connection again?

 

And after getting above error application goes to a state where below exception is logged continuously.

 

2022-02-21 10:00:43 WARN ZKUtil:637 % hconnection-0x55182842-0x47f15b1a977331e, quorum=cmvp9k0e.prd.cm.par.emea.cib:2181,cmvp9k0h.\
prd.cm.par.emea.cib:2181,cmvp9k0i.prd.cm.par.emea.cib:2181,cmvp9k0j.prd.cm.par.emea.cib:2181,cmvp9k0k.prd.cm.par.emea.cib:2181, bas\
eZNode=/hbase-secure Unable to get data of znode /hbase-secure/meta-region-server
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase-secure/meta-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:607)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:588)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:561)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1254)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1221)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1327)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1224)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at com.test.rkp.model.dao.EmptySchedulerRunDao.filterLatest(EmptySchedulerRunDao.java:45)
at com.test.rkp.runnable.EmptyReportSchedulerHelper.getLastSavedRunTime(EmptyReportSchedulerHelper.java:60)
at com.test.rkp.runnable.EmptyReportScheduler.main(EmptyReportScheduler.java:86)
2022-02-21 10:00:43 ERROR ZooKeeperWatcher:734 % hconnection-0x55182842-0x47f15b1a977331e, quorum=cmvp9k0e.prd.cm.par.emea.cib:2181\
,cmvp9k0h.prd.cm.par.emea.cib:2181,cmvp9k0i.prd.cm.par.emea.cib:2181,cmvp9k0j.prd.cm.par.emea.cib:2181,cmvp9k0k.prd.cm.par.emea.cib\
:2181, baseZNode=/hbase-secure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase-secure/meta-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:629)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:607)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:588)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:561)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1254)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1221)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1327)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1224)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:356)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:277)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:438)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:312)
at com.test.rkp.model.dao.EmptySchedulerRunDao.filterLatest(EmptySchedulerRunDao.java:45)
at com.test.rkp.runnable.EmptyReportSchedulerHelper.getLastSavedRunTime(EmptyReportSchedulerHelper.java:60)
at com.test.rkp.runnable.EmptyReportScheduler.main(EmptyReportScheduler.java:86)

 

 

5 REPLIES 5

Master Collaborator

@Dilan86 

 

Are you specifying a jaas.conf file for your application, through the java.security.auth.login.config property?

If so, could you please provide the configuration in it?

 

If not, create a file with your authentication details (principal and keytab) like the example below, save in a location that the application has access to and pass that property to the application using -Djava.security.auth.login.config=/path/to/jaas.conf

 

Client {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/path/to/appuser.keytab"
  storeKey=true
  useTicketCache=false
  principal="appuser@YOUR-REALM";
};

 

Cheers,

André

 

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

New Contributor

hi @araujo,

 

yes we are providing the jaas.conf

 

Client {

com.sun.security.auth.module.Krb5LoginModule required

debug = true

doNotPrompt = true

storeKey = true

useKeyTab = true

useTicketCache = false

principal = "tst@BK.DFN"

keyTab = "/etc/security/keytabs/tst.keytab"

serviceName = "zookeeper"

}

 

I don't think the issue is related to jaas.conf since application is working fine for most of the time. But we get the error below like in twice a month. After the below error application is not recovering.

Thanks.

 

2022-02-21 10:00:03 ERROR ZooKeeperSaslClient:384 % An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslExc\
eption: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection refused (Connection\
refused))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state\
.
2022-02-21 10:00:03 ERROR ClientCnxn:1015 % SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslExcept\
ion: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSExcep\
tion: No valid credentials provided (Mechanism level: Connection refused (Connection refused))]) occurred when evaluating Zookeeper\
Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.

 

 

Master Collaborator

@Dilan86 ,

 

If the application is not being affected and you only see this a few times a month, I wouldn't worry about it. It might be due to some intermittent issue, like network connectivity glitches, for example.

 

Unless it's hurting I would just ignore it.

 

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

New Contributor

Hi @araujo ,

The problem is after getting the error our application is not recovering. Do you have any suggestions on catching the below error in application and doing a retrying for connection?

 

Thanks

2022-02-21 10:00:03 ERROR ClientCnxn:1015 % SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslExcept\
ion: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSExcep\
tion: No valid credentials provided (Mechanism level: Connection refused (Connection refused))]) occurred when evaluating Zookeeper\
Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.

Master Collaborator

@Dilan86 ,

 

You can try enabling Kerberos debug log and waiting for it to happen again:

-Dsun.security.krb5.debug=true

 

What kind of applications is this?

Is it a long running application?

Have you noticed any patterns like "the application fails after running for x hours/days"?

 

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.
; ;