Support Questions

Find answers, ask questions, and share your expertise

Zookeeper kerberos issue or quorum issue?

avatar
Contributor

I use a kerberized cluster and once in a while I notice following error in my zookeeper client logs:

 

15/11/15 15:46:53 ERROR client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.


15/11/15 15:46:53 ERROR zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.

 

So, I had following doubt with this:

 

It is showing actual error to be connection reset. I am not sure connection RST to what? Is it to Kerberos KDC? But the log further seems to indicate that connection issue happened when connecting to ZK quorum member. So, in that case the RST flag is recd from ZK quorum member?

 

Thanks,

Sumit

1 ACCEPTED SOLUTION

avatar
Mentor
Yes, the Mechanism level: sub-codes usually pertain to operations within the context of a KDC or local Kerberos work. The connection reset being a network error is therefore alluding to the Client->KDC connection being reset.

The ZKs would auth to each other in secure mode, but the specific failure here is within just the auth layer (than the higher levels of ZK connectivity and responses).

View solution in original post

3 REPLIES 3

avatar
Mentor
Yes, the Mechanism level: sub-codes usually pertain to operations within the context of a KDC or local Kerberos work. The connection reset being a network error is therefore alluding to the Client->KDC connection being reset.

The ZKs would auth to each other in secure mode, but the specific failure here is within just the auth layer (than the higher levels of ZK connectivity and responses).

avatar
Contributor

Thanks Harsh,

 

So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right?

 

I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously?

 

Thanks again.

avatar
Mentor
> So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right?

Yes, it can be always taken as something wrong in the Kerberos layer (not necessarily only KDC, could also be things such as bad enctypes in keytab, etc., but always Kerberos mechanism related)

> I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously?

Did a retry of the auth perhaps succeed? Its not normal for it to repeat the errors.