Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

kerberos High Availability

avatar
Explorer

I am planning to implement kerberos HA at this point we have one kdc server which is single poing of failure. Can anyone please guide me how to enable HA for kerberos.

11 REPLIES 11

avatar
Master Guru

Yes, check the Manage krb5.conf through Cloudera Manager box and Save.

You can then follow the steps that Katelynn explained.  Cloudera Manager will then overwrite your current /etc/krb5.conf when you Click "Deploy Kerberos Client Configuration" from the cluster menu on the front page of Cloudera Manager

avatar
Explorer

Hi,

We have Kerberos configured in our Hadoop cluster.
We did a Wizard installation (https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_sg_intro_kerb.html), it works well.

We try to have a high level of availability, we have configured a secondary kdc-server (we followed the kerberos documentation).
We have a replication of the credentials  from the first Kerberos server to the second (like in the topic : https://community.hortonworks.com/articles/92333/configure-two-kerberos-kdcs-as-a-masterslave.html)
We set Kerberos configuration on Cloudera Manager to add the secondary kdc server. The configuration generate by Cloudera in /etc/krb5.conf contains :

[realms]
XXXXXX.COM = {
kdc = master1.com
admin_server = master1.com
kdc = worker1.com
}


We have the following configuration:
master1 : Kerberos server + Namenode (active) HDFS
worker1 : Kerberos server + Namenode HDFS
worker2 : Kerberos client + Datanode HDFS

 


We are testing the replication of Kerberos.

Case 1 : stop Kerberos server (kdc + kadmin) on master1 and init user ticket on worker2 with kinit

It works well.

Case 2 : stop Kerberos server (kdc + kadmin) and Namenode HDFS on master1 (to simulate the crash of the server master1)

Normaly, the Namenode on worker1 should be activate. But, there is an error : "This role's process exited. This role is supposed to be started." on worker1.
Message in log:

PriviledgedActionException as:hdfs/worker1.com@XXXXXX.COM (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Connection refused (Connection refused))

 

Conclusion/Question

So my conclusion is that the Namenode on worker1 doesn't use the secondary kdc (there is nothing in the kadmin.log on the worker1).
But if I do a kinit manually, that works. So, is not a problem of Kerberos.

If the server with the main Kerberos kdc crash, the hadoop services crash too.. This is a big problem.
Do you have a solution? Or any suggestion?



Thank you,
Martin.