We did some tests to see how usable a Kerberized cluster would be, and now we have decided to move on to a more production-like setup. We are starting fresh with a new realm, since the one we had initially picked will not work with our final naming convention.
I have been uninstalling Kerberos and deleting all Kerberos databases. The hostnames have also been changed (and changed within cloudera) to have fqdn which match the new realm. I have then been recreating everything related to Kerberos, and I can use kinit correctly, as long as I am on the machine running the KDC.
The configuration has also been updated in the Cloudera interface:
- KDC server host
- Kerberos Security Realm
I have been trying to push manually the new /etc/krb5.conf, copying it on all hosts, in order to regenerate the credentials. If I deactivate "Manage krb5.conf through Cloudera Manager", I can then use kinit correctly on those hosts. But as soon as I enable it, the old (wrong) krb5.conf is redeployed. And anyway, when I try to regenerate the credentials and krb5.conf is not managed by Cloudera, it fails because some commands are still assuming the old realm.
I have been scanning the Cloudera Manager for this dirty config file, but I can't find where it is kept. Could you please help me and tell how to make Cloudera forget about this old Kerberos realm?
I am a bit desperate here. I scanned quite exaustively the Cloudera manager host for occurences of the old realm, and besides log files, the only occurrences are found in '/var/lib/cloudera-scm-server-db/'. Does it mean I should edit the db manually with pgsql? How do I do that?
All right, so pursuing my investigation, here is what I did:
I found information about how to access the internal database here
Basically, you'll find the info about the db in '/etc/cloudera-scm-server/db.properties', and '/var/lib/cloudera-scm-server-db/data/generated_password.txt'. From there, using a bit of postgresql foo, I managed to dump all tables of the database 'scm' as csv files.
I found out that 'kdc_admin_user' from the table 'configs' is still using the old realm. Not sure what to do yet with this information...
So we went ahead and updated the value of 'kdc_admin_user' in the table 'configs'.
This yields better results, because now the commands I read in the log file of the server look like they could work. Before, we were getting:
KADMIN='kadmin -k -t /var/run/cloudera-scm-server/cmf9109851999959873240.keytab -p cloudera-scm/admin@OLDREALM -r OLDREALM
Now we get
KADMIN='kadmin -k -t /var/run/cloudera-scm-server/cmf9109851999959873240.keytab -p cloudera-scm/admin@NEWREALM -r NEWREALM'
The error is now:
kadmin: Key table entry not found while initializing kadmin interface
This makes sense, I think that cloudera is trying to use an outdated keytab to log as cloudera-scm/admin. I can only assume that this keytab is stored somewhere in the postgresql database. How could we find and update this entry?
Sorry to bump the thread like this, but this is still an issue.
Does the lack of answer mean that no solution exists and we should re-install the whole cluster?
I would say that it is most likely a lack of other users who have faced a similar issue. The community is mainly peer to peer but Clouderans do participate as time allows. Sometimes I can help by pointing out some documentation or a blog post but I cannot think of one that fits your situation.
I am a bit confused about what you are trying to do and what is happening.
I'll start off with how things should work:
If you are managing your own krb5.conf, then disable Manage krb5.conf through Cloudera Manager in your Cloudera Manager Kerberos configuration.
If that is disabled, you can then distribute your own /etc/krb5.conf files to all nodes and Cloudera Manager will not touch them.
Even if Manage krb5.conf through Cloudera Manager is enabled, Cloudera Manager does not push out krb5.conf files automatically. It is a fully manual process. The agent itself does not create /etc/krb5.conf.
The only way the agent could be doing what you say is if somehow there is a problem where the "DeployClientConfigsOfCluster" was running repeatedly. We would see this in the agent log of the impacted host.
If this is happening, please share the agent log file with us or at least the parts that show evidence the agent is creating /etc/krb5.conf.
Thanks for your quick reply.
I am playing with Cloudera Sandbox 5.13, (this issue present with 5.8 and 5.10 as well) we are plannig to implement kerberos on our PROD cluster so I wanted to try all config and wanted to check if we can update if in case we might need to in future.
I have checked the agent's log file but there is not such log entry for "DeployClientConfigsOfCluster".
Here are steps to replicate the issue:
1) Configure kerberos KDC with default realm as HADOOP.COM
2) Updated same in cloudera manager using "Enable Kerberos" wizard and select Manage krb5.conf through Cloudera Manager
3) Check all services are running without any issue
4) Now, destroy the kerberos KDC db using command "kdb5_util detroy" to update default realm to "QUICKSTART.CLOUDERA".
5)Update default realm in /etc/krb5.conf manually and re-create KDC db using "kdb5_util create -s"
6) Update default_realm in Cloudera Manager and Import Kerberos Account Manager Credentials
7) re-generate credentials (service principals) using "Generate Missing Credentials"
😎 Re-start all services/roles
9) Now, restart Cloudera Manager Agent to check if it's taking updated default_realm or old. It will create krb5.conf with old default_realm
10) You can also test by removing /etc/krb5.conf file and restart Cloudera Manager Agent. It will re-create krb5.conf with old default_realm.
PFA Cloudera Manager Agent log file if it helps to identify the issue.
I would try restarting Cloudera Manager with the following:
# service cloudera-scm-server restart
It sounds to me as if the krb5.conf file elements from before the realm change may be retained in memory.
Restarting should help us define if it is a transient issue or if the old realm is being loaded from the CM database.
If restarting does not solve the issue for you, please provide access to the CM and the agent logs from the time you are seeing the issue.
The agent doest not have any code in it to update krb5.conf, so my point was that is would have to be taking its instructions from Cloudera Manager.