Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Zookeeper kerberos issue or quorum issue?

SOLVED Go to solution

Zookeeper kerberos issue or quorum issue?

Contributor

I use a kerberized cluster and once in a while I notice following error in my zookeeper client logs:

 

15/11/15 15:46:53 ERROR client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.


15/11/15 15:46:53 ERROR zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.

 

So, I had following doubt with this:

 

It is showing actual error to be connection reset. I am not sure connection RST to what? Is it to Kerberos KDC? But the log further seems to indicate that connection issue happened when connecting to ZK quorum member. So, in that case the RST flag is recd from ZK quorum member?

 

Thanks,

Sumit

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Zookeeper kerberos issue or quorum issue?

Master Guru
Yes, the Mechanism level: sub-codes usually pertain to operations within the context of a KDC or local Kerberos work. The connection reset being a network error is therefore alluding to the Client->KDC connection being reset.

The ZKs would auth to each other in secure mode, but the specific failure here is within just the auth layer (than the higher levels of ZK connectivity and responses).
3 REPLIES 3

Re: Zookeeper kerberos issue or quorum issue?

Master Guru
Yes, the Mechanism level: sub-codes usually pertain to operations within the context of a KDC or local Kerberos work. The connection reset being a network error is therefore alluding to the Client->KDC connection being reset.

The ZKs would auth to each other in secure mode, but the specific failure here is within just the auth layer (than the higher levels of ZK connectivity and responses).

Re: Zookeeper kerberos issue or quorum issue?

Contributor

Thanks Harsh,

 

So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right?

 

I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously?

 

Thanks again.

Re: Zookeeper kerberos issue or quorum issue?

Master Guru
> So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right?

Yes, it can be always taken as something wrong in the Kerberos layer (not necessarily only KDC, could also be things such as bad enctypes in keytab, etc., but always Kerberos mechanism related)

> I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously?

Did a retry of the auth perhaps succeed? Its not normal for it to repeat the errors.