It might be a bit complicate for us migrate to the LDAP backend, so we were trying to avoid it.
It is possible the OS upgrade also involved changes to the pam behavior, but we do not have any system with the old state to check that.
We understand that KRB5CCNAME is needed for the TGT and must be set for Hue.
What about unsetting it before using the PAM authenticate, possibly setting it back returing from the call?
Would this avoid the side effects on the PAM login rules?
Tough situation indeed. I don't think it is possible to unset KRB5CCNAME like that as it is an environment variable for all of Hue. If you unset it one place, it would impact the entire server.
There may be something elegant here, but I'm not seeing it.
I think we may be in for quite an aggrivating adventure trying to get that to work.
Perhaps we can look to mitigation instead so that we can make sure the Hue kt_renwer runs kinit a lot.
Assuming users are not logging out of Hue a great deal, this may be enough to work around the issue. Even if the file is deleted, it will be added back pretty quickly.
In CM, go to the Hue configuration and search for Hue Keytab Renewal Interval
- Set it to 10 seconds.
- Save and restart Hue Service
You could even really set it to 5 or less if need be as the kinit is not much load on the KDC at all.
Happy weekend... hope the workaround makes it a better one :-)
We hope the issue might be corrected in a future CDH version.
Meanwhile, we decided using a different workaround: we made the ccache unwritable to the hue process (so it cannot be destroyed), disabled the renewers and programmed cron jobs on the servers to do the renewal job.
This seems to be working: the users get their tickets cashed in the fallback location /tmp, and the hue process is happy with its own tickets in /var/run/hue/
Your Hue credentials cache needs to be updated by the Hue kt_renewer, so I don't think your solution will work past the time when the ticket expires. Without a way to update the cache, eventually, the TGT will expire and Hue won't be able to communicate with any other services.
I think the workaround I proposed (decreasing the interval for renewal) will work best and give you the most service uptime without restarting.
What really surprises me about all of this is that logging out of Hue is actually triggering anything related to PAM; there should be no need to let the PAM module know of the Hue session terminating. I checked, and the simple PAM backend does not have a "logout" function, so if a user logs out of Hue and the accounts/logout call is made, this skips any interaction with the backend. Essentially, PAM and the OS should have no knowledge of a Hue logout. the "pam.py" package also has no logout hooks.
the "pam.py" file also doesn't load any modules, really. Rather, it makes calls to pam and returns success if login via pam is successful. After that, Hue and supporting code should have no session/state information relevant to pam. Everything is at the Hue / Django session level after that.
The more I think about this, the less it seems possible that initiating a log out from Hue could trigger any calls to pam modules.
I think we must be missing something in terms of understanding how the problem occurs. Not sure what though.
The credential cache is updated, because we update it using the cron job. The ticket expires after a few hours, or 1 day, so there is plenty of time for the cron job to renew it.
Sorry for not being clear: the issue is not pam logging out. The issue is only at login, with pam.authenticate using /var/run/hue/krb5_hue_ccache as user ccache when the user logs in, instead of /tmp/krb5_pam_XXXXXX, possibly because due to the inherited environment.
In my first message I talked about the user logging in/out, but at the time we were still investigating the issue.
In the following messages we were always talking about the login procedure.
Yay, I'm not going crazy... just thought this was a logout issue.
based on what I've seen, then, I think we may be able to work with this.
Perhaps we can do something special in unset KRB5CCNAME in pam.py... I'll take a quick peek
As for the cron job, yeah, that will work. I get it now.
I guess your workaround should be OK. The only other two option I could see would be to wrap the pam.authenticate() call with an unset and set of KRB5CCNAME. Assuming authentication takes milliseconds, it would be unlikely that Hue is attempting to retrieve cache information at that moment, but I don't know that it is any better than what you are up to.
for instance, in desktop/core/src/desktop/auth/backend.py
if pam.authenticate(username, password, desktop.conf.AUTH.PAM_SERVICE.get()):
and then after auth:
os.environ['KRB5CCNAME'] = desktop.conf.KERBEROS.CCACHE_PATH.get()
NOTE: we would need to import os in backend.py to do that.
So possibly, something like this would work:
class PamBackend(DesktopBackendBase): """ Authentication backend that uses PAM to authenticate logins. The first user to login will become the superuser. """ @metrics.pam_authentication_time def authenticate(self, request=None, username=None, password=None): username = force_username_case(username) del os.environ['KRB5CCNAME'] if pam.authenticate(username, password, desktop.conf.AUTH.PAM_SERVICE.get()): os.environ['KRB5CCNAME'] = desktop.conf.KERBEROS.CCACHE_PATH.get() is_super = False if User.objects.count() == 0: is_super = True try: if desktop.conf.AUTH.IGNORE_USERNAME_CASE.get(): user = User.objects.get(username__iexact=username) else: user = User.objects.get(username=username) except User.DoesNotExist: user = find_or_create_user(username, None) if user is not None and user.is_active: profile = get_profile(user) profile.creation_method = UserProfile.CreationMethod.EXTERNAL.name profile.save() user.is_superuser = is_super ensure_has_a_group(user) user.save() user = rewrite_user(user) return user os.environ['KRB5CCNAME'] = desktop.conf.KERBEROS.CCACHE_PATH.get() return None @classmethod def manages_passwords_externally(cls): return True
Might not be worth it, though
In the Cloudera Manager, make sure that you have started HDFS and all the necessary services. I ran into this problem as well, but starting the command cluster solved the issue.