Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ResourceManager failed to start.

avatar
Contributor
This is a new setup for CDP 7.1.6.
 
Issue:
ResourceManager failed to start.
 
NodeManger and Jobhistory server are running.
 
8:58:49.871 AM FATAL ResourceManager
Error starting ResourceManager
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /confstore/CONF_STORE
at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1793)
at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274)
at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268)
at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81)
at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265)
at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249)
at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34)
at org.apache.hadoop.util.curator.ZKCuratorManager.delete(ZKCuratorManager.java:331)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.ZKConfigurationStore.format(ZKConfigurationStore.java:148)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.deleteRMConfStore(ResourceManager.java:1658)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)

 

2 REPLIES 2

avatar
Contributor

@vec Can you check your Zk logs, you will find the actual error in the logs. Seems like ZK is rejecting the RM Connection.

avatar
Contributor

I deployed 3 zookeeper nodes and they are running well. And zK logs don't print any errors. After I stopped all zk nodes . The RM log prints below errors:

 

2023-12-22 07:34:11,542 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server cdp3.oia.com/192.168.1.176:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2023-12-22 07:34:11,542 INFO org.apache.zookeeper.ClientCnxn: Socket error occurred: cdp3.oia.com/192.168.1.176:2181: Connection refused
2023-12-22 07:34:11,642 WARN org.apache.zookeeper.Login: TGT renewal thread has been interrupted and will exit.
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.Login: Client successfully logged in.
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.Login: TGT refresh thread started.
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.Login: TGT valid starting at:        Fri Dec 22 07:34:10 CST 2023
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.Login: TGT expires:                  Sat Dec 23 07:34:10 CST 2023
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.Login: TGT refresh sleeping until: Sat Dec 23 03:31:02 CST 2023
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server cdp1.oia.com/192.168.1.205:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2023-12-22 07:34:11,645 INFO org.apache.zookeeper.ClientCnxn: Socket error occurred: cdp1.oia.com/192.168.1.205:2181: Connection refused
2023-12-22 07:34:12,746 WARN org.apache.zookeeper.Login: TGT renewal thread has been interrupted and will exit.
2023-12-22 07:34:12,748 INFO org.apache.zookeeper.Login: Client successfully logged in.
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.Login: TGT refresh thread started.
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.Login: TGT valid starting at:        Fri Dec 22 07:34:11 CST 2023
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.Login: TGT expires:                  Sat Dec 23 07:34:11 CST 2023
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.Login: TGT refresh sleeping until: Sat Dec 23 03:33:14 CST 2023
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server cdp2.oia.com/192.168.1.169:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2023-12-22 07:34:12,749 INFO org.apache.zookeeper.ClientCnxn: Socket error occurred: cdp2.oia.com/192.168.1.169:2181: Connection refused

 

Resumed the zk nodes , it looks connection established but throws an exception in the end.

 

2023-12-22 07:34:33,729 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server cdp2.oia.com/192.168.1.169:2181. Will attempt to SASL-authenticate using Login Context section 'Client'
2023-12-22 07:34:33,730 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.1.205:56000, server: cdp2.oia.com/192.168.1.169:2181
2023-12-22 07:34:33,757 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server cdp2.oia.com/192.168.1.169:2181, sessionid = 0x2002158fc320000, negotiated timeout = 40000
2023-12-22 07:34:33,758 INFO org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2023-12-22 07:34:33,781 INFO org.apache.curator.framework.imps.EnsembleTracker: New config event received: {server.1=cdp1.oia.com:3181:4181:participant, version=0, server.3=cdp3.oia.com:3181:4181:participant, server.2=cdp2.oia.com:3181:4181:participant}
2023-12-22 07:34:33,784 INFO org.apache.curator.framework.imps.EnsembleTracker: New config event received: {server.1=cdp1.oia.com:3181:4181:participant, version=0, server.3=cdp3.oia.com:3181:4181:participant, server.2=cdp2.oia.com:3181:4181:participant}
2023-12-22 07:34:33,795 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /confstore/CONF_STORE
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1793)
	at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274)
	at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268)
	at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67)
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81)
	at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265)
	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249)
	at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34)
	at org.apache.hadoop.util.curator.ZKCuratorManager.delete(ZKCuratorManager.java:331)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.ZKConfigurationStore.format(ZKConfigurationStore.java:148)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.deleteRMConfStore(ResourceManager.java:1658)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1534)
2023-12-22 07:34:33,803 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at cdp1.oia.com/192.168.1.205
************************************************************/

 

 The workaround I use is that I deleted the zk , yarn queue manager and yarn . And redeployed yarn only . so far it looks good.