Created 08-29-2016 08:25 AM
Hi,
I'm having problems starting Yarn App Timeline Server (HDP-2.4.0.0-169 kerberized cluster with Ambari 2.2.2.0).
Everything was working fine for several months until we had to reallocate servers to a different data center therefore the cluster had to be shut down. I'm able to start Active and Standby ResourceManagers (along with all NodeManagers), but App Timeline Server fails with the following in the logs:
2016-08-28 18:21:51,903 FATAL applicationhistoryservice.ApplicationHistoryServer (ApplicationHistoryServer.java:launchAppHistoryServer(171)) - Error starting ApplicationHistoryServer org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to login at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:112) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:169) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:178) Caused by: java.io.IOException: Login failure for yarn/hdp-nn01.local.net@HADOOP.LOCAL from keytab /etc/security/keytabs/yarn.service.keytab: javax.security.auth.login.LoginException: Checksum failed at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:275) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.doSecureLogin(ApplicationHistoryServer.java:335) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:110) ... 3 more Caused by: javax.security.auth.login.LoginException: Checksum failed at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:953) ... 6 more Caused by: KrbException: Checksum failed at sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:102) at sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:94) at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:175) at sun.security.krb5.KrbAsRep.decrypt(KrbAsRep.java:149) at sun.security.krb5.KrbAsRep.decryptUsingKeyTab(KrbAsRep.java:121) at sun.security.krb5.KrbAsReqBuilder.resolve(KrbAsReqBuilder.java:285) at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776) ... 19 more Caused by: java.security.GeneralSecurityException: Checksum failed at sun.security.krb5.internal.crypto.dk.AesDkCrypto.decryptCTS(AesDkCrypto.java:451) at sun.security.krb5.internal.crypto.dk.AesDkCrypto.decrypt(AesDkCrypto.java:272) at sun.security.krb5.internal.crypto.Aes256.decrypt(Aes256.java:76) at sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType.decrypt(Aes256CtsHmacSha1EType.java:100) ... 26 more 2016-08-28 18:21:51,904 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1 2016-08-28 18:21:51,906 INFO applicationhistoryservice.ApplicationHistoryServer (LogAdapter.java:info(45)) - SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down ApplicationHistoryServer at hdp-nn01.local.net/192.168.12.73 ************************************************************/
yarn.service.keytab is present on hdp-nn01.local.net, and krb5.conf seem to be the intact.
Any assistance would be greatly appreciated.
Thanks.
Created 08-29-2016 08:53 AM
Can you please try kinit with the yarn.service.keytab and see whether it is successful. You will be able to get the yarn service principal using below command
klist -kt /etc/security/keytabs/yarn.service.keytab
Take the principal name from the previous command and run
kinit -kt /etc/security/keytabs/yarn.service.keytab ${yarn-service-principal-name}
Created 08-29-2016 08:53 AM
Can you please try kinit with the yarn.service.keytab and see whether it is successful. You will be able to get the yarn service principal using below command
klist -kt /etc/security/keytabs/yarn.service.keytab
Take the principal name from the previous command and run
kinit -kt /etc/security/keytabs/yarn.service.keytab ${yarn-service-principal-name}
Created 08-29-2016 09:42 AM
Thank you Santhosh. It seems that it expired?
$ klist -kt /etc/security/keytabs/yarn.service.keytab Keytab name: FILE:/etc/security/keytabs/yarn.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 1 04/27/2016 15:56:20 yarn/hdp-nn01.local.net@HADOOP.LOCAL 1 04/27/2016 15:56:20 yarn/hdp-nn01.local.net@HADOOP.LOCAL 1 04/27/2016 15:56:20 yarn/hdp-nn01.local.net@HADOOP.LOCAL 1 04/27/2016 15:56:20 yarn/hdp-nn01.local.net@HADOOP.LOCAL 1 04/27/2016 15:56:20 yarn/hdp-nn01.local.net@HADOOP.LOCAL
Executing kinit -kt /etc/security/keytabs/yarn.service.keytab yarn/hdp-nn01.local.net@HADOOP.LOCAL gives me
kinit: Password incorrect while getting initial credentials
but I can't recall setting up the password.
Thanks.
Created 08-29-2016 10:08 AM
Looks like something is wrong with principal yarn/hdp-nn01.local.net@HADOOP.LOCAL. Can you please check whether the account exists in KDC or is it probably blocked ?. Run below command on the machine where your kdc server is running.
kadmin.local -q "get_principal yarn/hdp-nn01.local.net@HADOOP.LOCAL"
Created 08-29-2016 10:35 AM
Seems to be there:
Authenticating as principal root/admin@HADOOP.LOCAL with password. Principal: yarn/hdp-nn01.local.net@HADOOP.LOCAL Expiration date: [never] Last password change: Fri Jul 08 14:12:54 CEST 2016 Password expiration date: [none] Maximum ticket life: 1 day 00:00:00 Maximum renewable life: 0 days 00:00:00 Last modified: Fri Jul 08 14:12:54 CEST 2016 (hdp-svc/admin@HADOOP.LOCAL) Last successful authentication: [never] Last failed authentication: [never] Failed password attempts: 0 Number of keys: 4 Key: vno 2, aes256-cts-hmac-sha1-96 Key: vno 2, aes128-cts-hmac-sha1-96 Key: vno 2, des3-cbc-sha1 Key: vno 2, arcfour-hmac MKey: vno 1 Attributes: Policy: [none]
Created 08-29-2016 11:41 AM
Can you please re-import the keytab file and try kinit on the new keytab ? Below is the command to fetch the keytab.
kadmin.local -q "xst -k ~/yarn.service.keytab yarn/hdp-nn01.local.net@HADOOP.LOCAL"
Created 08-29-2016 03:32 PM
Thank you very much @Santhosh B Gowda -- that was it!