Created on 08-17-2017 07:05 PM
PROBLEM:
After upgrade from IOP (v 4.2.5) to HDP (v 2.6.x) in a kerberized setup, HSI start fails with following in HSI start log :
WARN impl.LlapZookeeperRegistryImpl: The cluster is not started yet (InvalidACL); will retry ERROR impl.LlapZookeeperRegistryImpl: Unable to start curator PathChildrenCache org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1] at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1] at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?] at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?] at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:827) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:285) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:914) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] ERROR cli.LlapStatusServiceDriver: FAILED: Failed to get instances from llap registry org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver$LlapStatusCliException: Failed to get instances from llap registry at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:581) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:285) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:914) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:836) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) ~[hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] ... 2 more Caused by: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1] at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1] at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?] at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?] at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:827) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) ~[hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173] ... 2 more FAILED: Failed to get instances from llap registry INFO LlapStatusServiceDriverConsole: LLAP status unknown INFO LlapStatusServiceDriverConsole: -------------------------------------------------------------------------------- WARN cli.LlapStatusServiceDriver: Watch mode enabled and got LLAP registry error. Retrying.. WARN impl.LlapZookeeperRegistryImpl: The cluster is not started yet (InvalidACL); will retry
Correspondingly, from LLAP YARN application log, we see the following:
Caused by: java.io.IOException: Login failure for hive/bug-86157-7.openstacklocal@EXAMPLE.COM from keytab /etc/security/keytabs/hive.llap.zk.sm.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1351) ~[hadoop-common-2.7.3.2.6.2.0-178.jar:?] at org.apache.hadoop.hive.llap.LlapUtil.loginWithKerberos(LlapUtil.java:78) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178] at org.apache.hadoop.hive.llap.security.SecretManager.createLlapZkConf(SecretManager.java:202) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178] ... 4 more Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) ~[?:1.8.0_141] at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760) ~[?:1.8.0_141] at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) ~[?:1.8.0_141] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_141] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_141] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141] at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) ~[?:1.8.0_141] at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) ~[?:1.8.0_141] at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) ~[?:1.8.0_141] at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) ~[?:1.8.0_141] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141] at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) ~[?:1.8.0_141] at javax.security.auth.login.LoginContext.login(LoginContext.java:587) ~[?:1.8.0_141] at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1340) ~[hadoop-common-2.7.3.2.6.2.0-178.jar:?] at org.apache.hadoop.hive.llap.LlapUtil.loginWithKerberos(LlapUtil.java:78) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178] at org.apache.hadoop.hive.llap.security.SecretManager.createLlapZkConf(SecretManager.java:202) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
REASON : After IOP-HDP upgrade, the keytab file permissions for "hive.llap.zk.sm.keytab" is only owner readable.
# ls -al /etc/security/keytabs/hive.llap.zk.sm.keytab -r--------. 1 yarn hadoop 428 Aug 15 19:34 hive.llap.zk.sm.keytab
Thus, hive user is not able to access hive.llap.zk.sm.keytab.
The group not having read permission is because IOP kerberos.json file for YARN have permissions removed for it.
FIX: Regenerating keytabs via Ambari fixes the permission issue, as hive user can now access the keytabs being part of hadoop group.
# ls -al /etc/security/keytabs/hive.llap.zk.sm.keytab -r--r-----. 1 yarn hadoop 428 Aug 15 20:26 hive.llap.zk.sm.keytab
- Once regeneration is done, start HSI again.
P.S. : Hive Server Interactive is made up of 2 sub-components : LLAP and Hive2/HiveServer2.