Support Questions

Find answers, ask questions, and share your expertise

Resourcemanagers(HA) don't start

avatar
Contributor

After enabling Kerberos on the cluster(upgraded to HDP 2.5), everything was working fine. Then I installed Zeppelin, which asked me to restart few components. After the restart, both the resourcemanagers are not starting up.

2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir 2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA>

2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux 2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64 2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=2.6.32-504.8.1.el6.x86_64 2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=yarn 2016-12-15 10:15:08,735 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/yarn 2016-12-15 10:15:08,736 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.5.0.0-1245/hadoop-yarn 2016-12-15 10:15:08,736 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=xxx.com:2181,yyy.com :2181,zzz.com:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@62ef27a8

2016-12-15 10:15:08,752 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server yyy.com/IP:2181. Will not attempt to authenticate using SASL (unknown error)

2016-12-15 10:15:08,757 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established to yyy.com/IP:2181, initiating session

2016-12-15 10:15:08,768 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1279)) - Session establishment complete on server yyy.com/IP:2181, sessionid = 0x3 590197ed680104, negotiated timeout = 10000

2016-12-15 10:15:08,784 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: java.io.IOException: Couldn't create /yarn-leader-election java.io.IOException: Couldn't create /yarn-leader-election at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:350) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceInit(EmbeddedElectorService.java:96) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:152) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:281) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1228) Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /yarn-leader-election at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1000) at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:997) at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1041) at org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:997) at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:344) ... 9 more

1 ACCEPTED SOLUTION

avatar
Contributor

It looks like your RM doesn't have write access to the root znode, and it can't create /yarn-leader-election

Please ensure that you have proper ACL on /

View solution in original post

2 REPLIES 2

avatar
Contributor

It looks like your RM doesn't have write access to the root znode, and it can't create /yarn-leader-election

Please ensure that you have proper ACL on /

avatar
Contributor

Yes, that was the issue.

I changed the ACL to r instead of cdrwa, which was causing the issue. As soon i changed it back to cdrwa, resourcemanagers started.

Thanks a lot 🙂