Created on 04-18-2017 01:50 AM - edited 08-17-2019 01:19 PM
Issue:
HSI's (Tech Preview) component LLAP start fails in kerberized setup because of missing keytabs.
When HSI is started, its component LLAP fails with below trace:
INFO impl.LlapRegistryService: Using LLAP registry (client) type: Service LlapRegistryService in state LlapRegistryService: STARTED INFO state.ConnectionStateManager: State change: CONNECTED ERROR impl.LlapZookeeperRegistryImpl: Unable to start curator PathChildrenCache. Exception: {} org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.5.3.0-37.jar:3.4.6-37--1] at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.2.5.3.0-37.jar:3.4.6-37--1] at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[zookeeper-3.4.6.2.5.3.0-37.jar:3.4.6-37--1] at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:232) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148) ~[curator-client-2.7.1.jar:?] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[curator-client-2.7.1.jar:?] at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141) ~[curator-client- 2.7.1.jar:?] at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99) ~[curator-client-2.7.1.jar:?] at org.apache.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:323) ~[curator-recipes-2.7.1.jar:?] at org.apache.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:300) ~[curator-recipes-2.7.1.jar:?] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:757) [hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:725) [hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:129) [hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:490) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:245) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:941) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] ERROR cli.LlapStatusServiceDriver: FAILED: Failed to get instances from llap registry org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver$LlapStatusCliException: Failed to get instances from llap registry at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:492) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.run(LlapStatusServiceDriver.java:245) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.main(LlapStatusServiceDriver.java:941) [hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] Caused by: java.io.IOException: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.checkPathChildrenCache(LlapZookeeperRegistryImpl.java:760) ~[hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.registry.impl.LlapZookeeperRegistryImpl.getInstances(LlapZookeeperRegistryImpl.java:725) ~[hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getInstances(LlapRegistryService.java:129) ~[hive-exec-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] at org.apache.hadoop.hive.llap.cli.LlapStatusServiceDriver.populateAppStatusFromLlapRegistry(LlapStatusServiceDriver.java:490) ~[hive-llap-server-2.1.0.2.5.3.0-37.jar:2.1.0.2.5.3.0-37] ... 2 more
This can happen in case the HSI is enabled after kerberizing the cluster.
Reason:
- This is because HSI needs 2 ketab files : 'hive.service.keytab' and 'hive.llap.zk.sm.keytab' present on all the YARN's NodeManager nodes.
- If HSI is not enabled before the cluster's kerberization, the above two keytab files will not get distributed on all the NodeManager nodes, unlike when HSI is Enabled before kerberization.
Thus, the error:
Caused by: org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /llap-sasl/user-hive at org.apache.zookeeper.KeeperException.create(KeeperException.java:121) ~[zookeeper-3.4.6.2.5.0.0-1245.jar:3.4.6-1245--1]
because the ZK node is not created / missing.
zk: localhost:2181(CONNECTED) 3] ls /llap-sasl [] zk node is missing
Resolution:
- Regenerating keytabs from Ambari Kerberos page, will distribute the above keytab files on all NodeManager Nodes.
- Further, do confirm that Hive's config hive.llap.zk.sm.connectionString is updated with the list of all Zookeeper Nodes in the cluster. For example:
zk.host1.org:2181,zk.host2.org:2181,zk.host3.org:2181
The Zookeeper Nodes list ca be got from here:
Note to append the Port Numbers as mentioned in example.
Restart HSI to confirm the behavior.