Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Expert Contributor

PROBLEM:

After upgrade from IOP (v 4.2.5) to HDP (v 2.6.x) in a kerberized setup, HSI start fails with following in HSI start log :

WARN impl.LlapZookeeperRegistryImpl: The cluster is not started yet (InvalidACL); will retry
ERROR impl.LlapZookeeperRegistryImpl: Unable to start curator PathChildrenCache
org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
	at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?]
	at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?]
	at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:827) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) [hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:285) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:914) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
ERROR cli.LlapStatusServiceDriver: FAILED: Failed to get instances from llap registry
org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver$LlapStatusCliException: Failed to get instances from llap registry
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:581) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:285) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:914) [hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
	at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:836) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) ~[hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	... 2 more
Caused by: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.6.2.0-173.jar:3.4.6-173--1]
	at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?]
	at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?]
	at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?]
	at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:827) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:790) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:139) ~[hive-exec-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:579) ~[hive-llap-server-2.1.0.2.6.2.0-173.jar:2.1.0.2.6.2.0-173]
	... 2 more
FAILED: Failed to get instances from llap registry
INFO LlapStatusServiceDriverConsole: LLAP status unknown
INFO LlapStatusServiceDriverConsole: --------------------------------------------------------------------------------
WARN cli.LlapStatusServiceDriver: Watch mode enabled and got LLAP registry error. Retrying..
WARN impl.LlapZookeeperRegistryImpl: The cluster is not started yet (InvalidACL); will retry

Correspondingly, from LLAP YARN application log, we see the following:

Caused by: java.io.IOException: Login failure for hive/bug-86157-7.openstacklocal@EXAMPLE.COM from keytab /etc/security/keytabs/hive.llap.zk.sm.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user

        at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1351) ~[hadoop-common-2.7.3.2.6.2.0-178.jar:?]
        at org.apache.hadoop.hive.llap.LlapUtil.loginWithKerberos(LlapUtil.java:78) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
        at org.apache.hadoop.hive.llap.security.SecretManager.createLlapZkConf(SecretManager.java:202) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
        ... 4 more
Caused by: javax.security.auth.login.LoginException: Unable to obtain password from user

        at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897) ~[?:1.8.0_141]
        at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760) ~[?:1.8.0_141]
        at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) ~[?:1.8.0_141]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_141]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_141]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141]
        at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) ~[?:1.8.0_141]
        at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) ~[?:1.8.0_141]
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) ~[?:1.8.0_141]
        at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) ~[?:1.8.0_141]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_141]
        at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) ~[?:1.8.0_141]
        at javax.security.auth.login.LoginContext.login(LoginContext.java:587) ~[?:1.8.0_141]
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1340) ~[hadoop-common-2.7.3.2.6.2.0-178.jar:?]
        at org.apache.hadoop.hive.llap.LlapUtil.loginWithKerberos(LlapUtil.java:78) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]
        at org.apache.hadoop.hive.llap.security.SecretManager.createLlapZkConf(SecretManager.java:202) ~[hive-exec-2.1.0.2.6.2.0-178.jar:2.1.0.2.6.2.0-178]

REASON : After IOP-HDP upgrade, the keytab file permissions for "hive.llap.zk.sm.keytab" is only owner readable.

# ls -al /etc/security/keytabs/hive.llap.zk.sm.keytab 
-r--------. 1 yarn      hadoop  428 Aug 15 19:34 hive.llap.zk.sm.keytab

Thus, hive user is not able to access hive.llap.zk.sm.keytab.

The group not having read permission is because IOP kerberos.json file for YARN have permissions removed for it.

FIX: Regenerating keytabs via Ambari fixes the permission issue, as hive user can now access the keytabs being part of hadoop group.

# ls -al /etc/security/keytabs/hive.llap.zk.sm.keytab 
-r--r-----. 1 yarn hadoop 428 Aug 15 20:26 hive.llap.zk.sm.keytab

- Once regeneration is done, start HSI again.

P.S. : Hive Server Interactive is made up of 2 sub-components : LLAP and Hive2/HiveServer2.

1,469 Views
0 Kudos