Reply
New Contributor
Posts: 5
Registered: ‎07-06-2018

Kerberos slave - high availability

Hi,

We have Kerberos configured in our Hadoop cluster.
We did a Wizard installation (https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_sg_intro_kerb.html), it works well.

We try to have a high level of availability, we have configured a secondary kdc-server (we followed the kerberos documentation).
We have a replication of the credentials  from the first Kerberos server to the second (like in the topic : https://community.hortonworks.com/articles/92333/configure-two-kerberos-kdcs-as-a-masterslave.html)
We set Kerberos configuration on Cloudera Manager (v5.14) to add the secondary kdc server. The configuration generate by Cloudera in /etc/krb5.conf contains :

[realms]
XXXXXX.COM = {
kdc = master1.com
admin_server = master1.com
kdc = worker1.com
}


We have the following configuration:
master1 : Kerberos server + Namenode (active) HDFS
worker1 : Kerberos server + Namenode HDFS
worker2 : Kerberos client + Datanode HDFS

 


We are testing the replication of Kerberos.

Case 1 : stop Kerberos server (kdc + kadmin) on master1 and init user ticket on worker2 with kinit

It works well.

Case 2 : stop Kerberos server (kdc + kadmin) and Namenode HDFS on master1 (to simulate the crash of the server master1)

Normaly, the Namenode on worker1 should be activate. But, there is an error : "This role's process exited. This role is supposed to be started." on worker1.
Message in log:

PriviledgedActionException as:hdfs/worker1.com@XXXXXX.COM (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Connection refused (Connection refused))

 

Conclusion/Question

So my conclusion is that the Namenode on worker1 doesn't use the secondary kdc (there is nothing in the kadmin.log on the worker1).
But if I do a kinit manually, that works. So, is not a problem of Kerberos.

If the server with the main Kerberos kdc crash, the hadoop services crash too.. This is a big problem.
Do you have a solution? Or any suggestion?

 

Ps : I have already asked on this topic : http://community.cloudera.com/t5/Cloudera-Manager-Installation/kerberos-High-Availability/m-p/77651#..., but maybe is better to create a new post.


Thank you,
Martin.

Master
Posts: 315
Registered: ‎07-01-2015

Re: Kerberos slave - high availability

Hi,

 try to change /etc/krb5.conf to

kdc = master1.com worker1.com

And also, are you using HDFS in HA mode with JournalNodes? What are in the Journal logs?

Posts: 922
Topics: 1
Kudos: 213
Solutions: 115
Registered: ‎04-22-2014

Re: Kerberos slave - high availability

@martinbo,

 

Actually, the syntax suggested is not correct for recent releases of MIT Kerberos and will likely cause worse problems.

 

The syntax of your [realms] section is correct in using a separate kdc= for each kdc.

 

Please post the full /etc/krb5.conf file for your work1.com host

 

By default Java will use the following basic algorithm:

 

- Try the first kdc for the realm (master1.com in your case)

- Wait up to 30 seconds for a response

- Try the next kdc listed (in order, which is worker1.com)

- Wait up to 30 seconds for a response

- If no response... fail.

 

Based on what you have explained, you may be experiencing the issue listed here:

https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html#conce...

 

If you don't have:

 

kdc_timeout=3000 in your [libdefaults] section of your /etc/krb5.conf file on worker1.com, then add it and retry your scenario.

 

 

Master
Posts: 315
Registered: ‎07-01-2015

Re: Kerberos slave - high availability

[ Edited ]

@bgooley,
You are right I have kdc = host1 host2
in my krb5.conf files and it works because the first host is available. 
So I should I change it for:
kdc = host1
kdc = host2

Thanks!

Posts: 922
Topics: 1
Kudos: 213
Solutions: 115
Registered: ‎04-22-2014

Re: Kerberos slave - high availability

@Tomas79,

 

Some implementations do (or did) support both formats, but for best results, the separate kdc= on each line should work with all current Kerberos clients

 

 

Announcements