Member since
03-04-2016
10
Posts
0
Kudos Received
0
Solutions
07-22-2019
01:09 PM
The credential cache is updated, because we update it using the cron job. The ticket expires after a few hours, or 1 day, so there is plenty of time for the cron job to renew it. Sorry for not being clear: the issue is not pam logging out. The issue is only at login, with pam.authenticate using /var/run/hue/krb5_hue_ccache as user ccache when the user logs in, instead of /tmp/krb5_pam_XXXXXX, possibly because due to the inherited environment. In my first message I talked about the user logging in/out, but at the time we were still investigating the issue. In the following messages we were always talking about the login procedure. Regards, Maurizio
... View more
07-22-2019
09:14 AM
Thank you. We hope the issue might be corrected in a future CDH version. Meanwhile, we decided using a different workaround: we made the ccache unwritable to the hue process (so it cannot be destroyed), disabled the renewers and programmed cron jobs on the servers to do the renewal job. This seems to be working: the users get their tickets cashed in the fallback location /tmp, and the hue process is happy with its own tickets in /var/run/hue/ Regards, Maurizio
... View more
07-19-2019
02:09 PM
It might be a bit complicate for us migrate to the LDAP backend, so we were trying to avoid it. It is possible the OS upgrade also involved changes to the pam behavior, but we do not have any system with the old state to check that. We understand that KRB5CCNAME is needed for the TGT and must be set for Hue. What about unsetting it before using the PAM authenticate, possibly setting it back returing from the call? Would this avoid the side effects on the PAM login rules? Maurizio
... View more
07-19-2019
12:31 PM
Thank you for your answer. We (me and my colleagues) were able to reproduce the issue, pinning it down to PAM. First of all, we are using Ubuntu 18.04 and CM 6.2.0, and this might be related to the error, that seems to be tied to the PAM usage from a library inside Hue, since we upgraded at once from Ubuntu 16.04 and CM 6.1.0 to the current configuration. The error is triggered by the PAM login authentication. Informally, as far as we understand, when Hue starts. it initializes its own kerberos credential cache in /var/run/hue/krb5_hue_ccache, as defined by the configuration via the ccache_path Hue parameter, mirrored in the KRB5CCNAME environment variable. This is the ccache known also by the ticket renewer. What it seems to be happening is that, after the Hue Server initialization, that KRB5CCNAME is left in the OS environment, and it is inheredited by the Python pam.authenticate function (as called at line 329 of backend.py). When any user logs in from the WebUI, the PAM login rule will pass that that environment to its auth/krb5 component, that tries to reuse that /var/run/hue/krb5_hue_ccache file as the one belonging to the user. Since that file has the hue tickets, this operation fails: PAM/krb5 invalidates it, cancelling it, and then falls back to using the next possible file location (usually /tmp/krb5_pam_XXXXXX). The /var/run/hue/krb5_hue_ccache has been now cancelled. and the Ticket Renewer will not find any ticket to be renewed anymore. The user is logged in, but the hue server has lost its tickets to admin all the other services. Probably, while KRB5CCNAME must be correctly set for Hue to correctly interact with the other components of the Cloudera system, it should removed from the os.environ while interacting with the PAM subsystem, that should instead rely on the local OS environment. We are still trying to find a local workaround for the issue, but we hope this info can be useful to help fixing the bug. Regards, Maurizio
... View more
07-18-2019
10:40 AM
I meant people logging in or out the Hue WebUI, being authenticated using the desktop.auth.backend.PamBackend (and pam login using an LDAP backend) The system used to work regularly with no issue until we migrated from CM6.1 to CM6.2 a few weeks ago. We activated "auditd" to trace the access to the ccache file, and the audit log shows access to the ccache when users "log out" from the Hue WebUI time->Thu Jul 18 15:31:05 2019
type=PROCTITLE msg=audit(1563456665.989:2959): proctitle=707974686F6E322E37002F6F70742F636C6F75646572612F70617263656C732F4344482D362E322E302D312E636468362E322E302E70302E3936373337332F6C69622F6875652F6275696C642F656E762F62696E2F6875650072756E6368657272797079736572766572
type=PATH msg=audit(1563456665.989:2959): item=1 name="/var/run/hue/hue_krb5_ccache" inode=1169 dev=00:17 mode=0100600 ouid=982 ogid=981 rdev=00:00 nametype=DELETE cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=PATH msg=audit(1563456665.989:2959): item=0 name="/var/run/hue/" inode=1150 dev=00:17 mode=040755 ouid=982 ogid=981 rdev=00:00 nametype=PARENT cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
type=CWD msg=audit(1563456665.989:2959): cwd="/run/cloudera-scm-agent/process/11096-hue-HUE_SERVER"
type=SYSCALL msg=audit(1563456665.989:2959): arch=c000003e syscall=87 success=yes exit=0 a0=7f9a9c33f820 a1=2 a2=1 a3=0 items=2 ppid=19438 pid=19449 auid=4294967295 uid=982 gid=981 euid=982 suid=982 fsuid=982 egid=981 sgid=981 fsgid=981 tty=(none) ses=4294967295 comm="python2.7" exe="/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/hue/build/env/bin/python2.7" key="hue_krb5_ccache It seems that the process cancelling the file is \_ python2.7 /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/hue/build/env/bin/hue runcherrypyserver We tried to better pin down the specific activity from the source code, but with no success. Which other types of details do you think you need? We might include more renewer log errors, but those are just about the file not existing anymore. From the Cloudera Manager point of view, everything is green and working correctly, because the processes are all active. Thanks Maurizio
... View more
07-18-2019
06:15 AM
As a quick update, it seems the ccache file is cancelled from a Hue backend server after a user logs in/out from the system. Maurizio
... View more
07-18-2019
04:53 AM
After migrating to CM 6.2.0, we are having a similar problem for the access of users in Hue. We pinned it down to the /var/run/hue/hue_krb5_ccache disappearing, so that we have the error GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('No Kerberos credentials available (default cache: /var/run/hue/hue_krb5_ccache)', -1765328243)) in /var/log/hue/error.log on the HUE servers (we have a one balancer, 4 backends configuration) We have been forced to regenerate the keytabs for the ticket renewers and restart Hue, just to make the system work for a few hours, until the Kerberos cache disappears again. We are currently trying to monitor what is actually accessing or removing the ccache. Thanks, Maurizio
... View more
04-08-2019
07:09 AM
Thank you. We will see if the problem is solved when we will be able to update the system (we are currently in a production phase and we refrain to introduce changes unless strictly necessary).
... View more
02-20-2019
02:51 AM
It did not work. There were supervisord processes older that the latest restart. Running your suggested command on all the clients did not work. We also completely killed all server/agents, and killed the lingering supervisord processes, before restarting the agents, and nothing changed. Consider that TLS used to work in 5.15, and still work correctly for everything in the Manager and in the CDH communications, but the Host/Security Inspectors Is there a way to execute the failing mgmt/mgmt.sh from the console to get some additional insight on the issue? Thanks anyway.
... View more
02-18-2019
03:20 AM
We just upgraded our Cloudera Manager from 5.15 to 6.1.0
TLS security was enabled, and it works on server/agent communication and for all the CDH web interfaces. No specific issue there.
We just have problems with the CM "Host Inspector" and "Security Inspector", that cannot run on any host, failing with the message
"IOException thrown while collecting data from host: Unrecognized SSL message, plaintext connection?"
Our CA certificates are included in jssecacerts, we restarted the server/agents, and every other communication seems to work.
Any idea?
Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager