Member since
04-22-2014
1218
Posts
341
Kudos Received
157
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
22780 | 03-03-2020 08:12 AM | |
13196 | 02-28-2020 10:43 AM | |
3815 | 12-16-2019 12:59 PM | |
3416 | 11-12-2019 03:28 PM | |
5187 | 11-01-2019 09:01 AM |
07-19-2019
01:16 PM
1 Kudo
@mmmunafo, Thanks for the clear description. I did notice, when perusing Hue source, that KRB5CCNAME was set and was curious why. It was the only thing I could think of that would be influencing PAM behavior. Very cool... let me see if I can figure out why we are adding KRB5CCNAME and if we can get rid of it
... View more
07-18-2019
06:31 PM
@Debashish, yup, stderr.log looks good. Please check your Region Server log file on that host (usually it is in /var/log/hbase with the name REGIONSERVER in it). I am guessing you will see an attempt to start up and then a failure of sorts.
... View more
07-18-2019
06:28 PM
@mmmunafo, I upgraded to 6.2 and PAM using the default "login" module doesn't cause the credentials cache to be deleted. Any other ideas of how these two things could be tied in? Perhaps the PAM module is doing something special, but it is the PID of the Hue process that is in the audit...
... View more
07-18-2019
05:38 PM
@mmmunafo, WOW! I see what you mean! How did you track this down to happening when a user logs out of Hue? I have to admit that PAM auth is used less than most other forms of authentication, so it is possible that no one has hit some sort of issue or bug. If you have the runcpserver.log that shows time when a user is logging out, I'd be interested in seeing it in case it yields any clues. In the meantime, I have a cluster I haven't upgraded to CDH 6.2 yet, so I'll set up PAM auth in 6.1.1 and then upgrade just to see. Might be a couple days, but I'll let you know if I think of anything in the meantime.
... View more
07-18-2019
08:13 AM
@mmmunafo, What "system" are you users logging out of? If it is your OS, it sounds like a kdestroy is being done and your users' cache should not be tied to Hue's cache which is intended only for the Hue process on that host. Hue does not execute a "kdestroy" command so it is very unlikely that Hue is removing the credentials cache. Happy to help, but I think we need some more details of what you are observing to understand the problem. Cheers, Ben
... View more
07-17-2019
10:23 PM
1 Kudo
@Debashish, let's make sure that we can confirm the cause of your problem first, though, before pursuing the znode stuff.
... View more
07-17-2019
10:22 PM
@Debashish, The "supervisor_status" file is not used by hbase, so this message should be non-fatal. I'd check to see if the stderr.log file for the region server process and see if there were any other messages after that Permission Denied. My guess is that the region server log file may contain errors regarding znodes. If Hbase created an Zookeeper znodes while Kerberos was enabled, then the znodes will require auth and will need to be recreated. I think the solution I mentioned here should help: https://community.cloudera.com/t5/Cloudera-Manager-Installation/Disabling-Kerberos-on-Cloudera-EXpress-5-5-1-HBase-issue/m-p/42535/highlight/true#M7642
... View more
07-17-2019
10:09 PM
@TCloud, The configuration we want is the one that got us the following: WrongHost: Peer certificate subjectAltName does not match host, expected srv-c01.mws.mds.xyz, got DNS:cm-r01nn01.mws.mds.xyz This error means that the Cloudera Manager certificate only contains a SAN or CN subject value of cm-r01nn01.mws.mds.xyz. Since the agent is configured to connect to srv-c01.mws.mds.xyz, it attempts to validate that the certificate is valid for srv-c01.mws.mds.xyz. This situation is addressed here: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_cm_ha_tls.html#cloudera-manager-server-cert-requirements-for-HA In order to make sure that clients can connect to CM by using both srv-c01.mws.mds.xyz and cm-r01nn01.mws.mds.xyz, we need to create a self-signed certificate that contains both in Subject Alternative Name. For a self-signed certificate, you could use: keytool -keystore testkeystore.jks -storepass password -keypass password -alias cm-r01nn01.mws.mds.xyz -genkeypair -keysize 2048 -keyalg RSA -dname "CN=cm-r01nn01.mws.mds.xyz" -ext san=dns:cm-r01nn01.mws.mds.xyz,dns:srv-c01.mws.mds.xyz If you do recreate the CM certificate like that, you will need to also replace the previous certifiate with this one in any trust store you created since a new key pair was created. Although it might require a bit more doing, the above should address the error you get when using TLS pass-through in HAProxy. Next, we need to make sure that HAProxy routes requests to your primary CM host every time and only routes to the other host in the event of the primary host's failure. I believe this can be achieved by removing "balance roundrobin" but I'm not sure. I feel like it may make sense to use "backup" directives in the server configuration for nn02 but I'm not sure... seems our example doesn't feel it is necessary.
... View more
07-17-2019
10:37 AM
@TCloud, Have you tried it and it failed? If so, what was the problem. You configure the agent with a hostname and a port that it will use to send heartbeats to that host and port. If you have TLS enabled, then the same rules apply: If the client (agent) is doing validation, then it must be able to trust the signer of the CM certificate and it must be able to validate that the hostname it connected to is included in the certificate (in Subject Alt Name or CN subject). If you are doing agent authentication to CM, then CM must trust the signer of the certificate presented by the agent. I don't know if TLS termination at the balancer will work unless the balancer can authenticate. I'd recommend against termination with heartbeats.
... View more
07-17-2019
09:43 AM
@Roroka, Since the agent has not been able to heartbeat to Cloudera Manager, it does not know what parcels it needs, so the error you observe regarding "active_parcels.json" is occurring due to an earlier problem. Can you take a closer look at the cloudera-scm-agent.log to see what the first exception (probably mentions "heartbeat"). If you can include that and 10 or more lines before and after, that should help give us some context for the problem. If you can, also share with us the output of the following command when run on the host that is not able to heartbeat: grep -v -e '^[[:space:]]*$' -e '^#' /etc/cloudera-scm-agent/config.ini Thanks!
... View more