Created on 03-19-2015 03:01 PM - edited 09-16-2022 02:24 AM
I installed the latest Cloudera Manager Server and Agent packages (5.3.2) started everything up fine, created my cluster, installed parcels for HDFS, HBase, Hive, Hue, Impala, Oozie, Yarn, and Zookeeper. I fixed all configuration and health issues. Everything was working exactly as expected. I then began following instructions I found here: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_intro_kerb.html to get my cluster secured with Kerberos. Have my KDC fully configured and running. The wizard was working great until just after it prompted me for the admin principal (cloudera-scm/admin) and its password. After I entered those, it indicated that it had successfully authenticated that principal, and began to configure my services to be used with kerberos. At some point though (I believe as it was trying to turn the Cloudera Management Service back on) it failed, and emitted this:
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Login failure for hue/my.namenode.com@MY.REALM.COM from keytab hue.keytab at com.google.common.base.Throwables.propagate(Throwables.java:160) at com.cloudera.cmf.cdhclient.CdhExecutorFactory.createExecutor(CdhExecutorFactory.java:274) at com.cloudera.cmf.cdhclient.CdhExecutorFactory.createExecutor(CdhExecutorFactory.java:309) at com.cloudera.enterprise.AbstractCDHVersionAwarePeriodicService.<init>(AbstractCDHVersionAwarePeriodicService.java:73) at com.cloudera.cmon.firehose.AbstractHBasePoller.<init>(AbstractHBasePoller.java:95) at com.cloudera.cmon.firehose.HBaseFsckPoller.<init>(HBaseFsckPoller.java:53) at com.cloudera.cmon.firehose.Firehose.createSecurityAwarePollers(Firehose.java:446) at com.cloudera.cmon.firehose.Firehose.setupServiceMonitoringPollers(Firehose.java:436) at com.cloudera.cmon.firehose.Firehose.<init>(Firehose.java:311) at com.cloudera.cmon.firehose.Main.main(Main.java:527) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Login failure for hue/my.namenode.com@MY.REALM.COM from keytab hue.keytab at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at com.cloudera.cmf.cdhclient.CdhExecutorFactory.createExecutor(CdhExecutorFactory.java:268) ... 8 more Caused by: java.lang.RuntimeException: java.io.IOException: Login failure for hue/my.namenode.com@MY.REALM.COM from keytab hue.keytab at com.google.common.base.Throwables.propagate(Throwables.java:160) at com.cloudera.cmf.cdhclient.CdhExecutorFactory$SecureClassLoaderSetupTask.run(CdhExecutorFactory.java:491) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Login failure for hue/my.namenode.com@MY.REALM.COM from keytab hue.keytab at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:855) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:279) at com.cloudera.cmf.cdh4client.CDH4ObjectFactoryImpl.login(CDH4ObjectFactoryImpl.java:194) at com.cloudera.cmf.cdhclient.CdhExecutorFactory$SecureClassLoaderSetupTask.run(CdhExecutorFactory.java:485) ... 5 more Caused by: javax.security.auth.login.LoginException: Connection refused at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:767) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:784) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) at javax.security.auth.login.LoginContext.login(LoginContext.java:590) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:846) ... 8 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.security.krb5.internal.TCPClient.<init>(NetClient.java:65) at sun.security.krb5.internal.NetClient.getInstance(NetClient.java:43) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:372) at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:343) at java.security.AccessController.doPrivileged(Native Method) at sun.security.krb5.KdcComm.send(KdcComm.java:327) at sun.security.krb5.KdcComm.send(KdcComm.java:219) at sun.security.krb5.KdcComm.send(KdcComm.java:191) at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:319) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:364) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735) ... 21 more
The wizard offered no recourse, or suggestions for how to continue. Going back to the CM "home," I can get the same error every time I try to restart the Cloudera Management Service. I get a different error if I try to restart my cluster first instead, but it also seems to indicate that kerberos is not working:
2015-03-19 21:25:43,255 ERROR org.apache.zookeeper.server.ZooKeeperServerMain: Unexpected exception, exiting abnormally java.io.IOException: Could not configure server because SASL configuration did not allow the ZooKeeper server to authenticate itself properly: javax.security.auth.login.LoginException: Connection refused at org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:207) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:87) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:116) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
However, the "security inspector" tool executes successfully, and all the principals listed under the Kerberos->credentials page match the principals I can see on my KDC's kadmin shell.
Any idea what's gone wrong, or how I can troubleshoot it?
Created 03-20-2015 08:45 AM
Update: I found the solution on another user's post here: https://community.cloudera.com/t5/Cloudera-Manager-Installation/aftper-config-kerberos-on-CDH5-Servi...
Looks like cloudera manager only uses TCP, even though it is not recommended to use TCP on your KDC, as there is little protection against denail-of-service attacks (http://linux.die.net/man/5/kdc.conf). Why is this not noted in the documentation as a requirement for the KDC?
Created 03-19-2015 05:18 PM
Created 03-20-2015 07:24 AM
I had CM create and deploy the KRB5.conf files originally. I've tried modifying them manually a few times since, since I can no longer use CM to push changes to them while the services refuse to start. The content of the file is:
[libdefaults] default_realm = MY.REALM.COM dns_lookup_kdc = true dns_lookup_realm = false ticket_lifetime = 36000 renew_lifetime = 604800 forwardable = true default_tgs_enctypes = aes256-cts:normal default_tkt_enctypes = aes256-cts:normal permitted_enctypes = aes256-cts:normal udp_preference_limit = 1 [realms] MY.REALM.COM = { kdc = MY.KDC.HOST.COM admin_server = MY.KDC.HOST.COM default_domain = MY.KDC.HOST.COM } [domain_realm] .my.realm.com = MY.REALM.COM my.realm.com = MY.REALM.COM [logging] kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmin.log default = FILE:/var/log/krb5lib.log includedir /etc/krb5.conf.d/
I do have AES strong encryption in use, but both the security jars are present:
# ls -lah /usr/java/latest/jre/lib/security/*.jar -rw-rw-r-- 1 root root 2.5K May 31 2011 /usr/java/latest/jre/lib/security/local_policy.jar -rw-rw-r-- 1 root root 2.5K May 31 2011 /usr/java/latest/jre/lib/security/US_export_policy.jar
Lookups also seem to be returning as expected. Where should I go from here?
Created 03-20-2015 08:45 AM
Update: I found the solution on another user's post here: https://community.cloudera.com/t5/Cloudera-Manager-Installation/aftper-config-kerberos-on-CDH5-Servi...
Looks like cloudera manager only uses TCP, even though it is not recommended to use TCP on your KDC, as there is little protection against denail-of-service attacks (http://linux.die.net/man/5/kdc.conf). Why is this not noted in the documentation as a requirement for the KDC?
Created 05-06-2015 04:56 PM
Because we expect the KDC in a hadoop environment to be protected from this by network design.... the reason we hard set TCP rather than UDP is that in complex network environments, we can drive failures within the cluster. IMHO this is valid for internet/hostile network environments, which we strongly reccomend against being used for a cluster environments.
Created 05-06-2015 05:01 PM
Also DoS issue with TCP seems to have been resolved as of krb5 1.10 - https://krbdev.mit.edu/rt/Ticket/Display.html?id=1316
Created 05-06-2015 05:02 PM
We can evaluate expanding the example and definition of what we do by default as well, thanks for pointing this out, we'll continue to review.
Created 02-21-2020 02:58 AM
Try Setting up "udp" for the Kerberos Clients.
/etc/krb5.conf
[libdefaults]
udp_preference_limit = 1