Created 05-30-2018 01:56 PM
Hi Team,
ResourceManager servcie is stopping automatically with in few sec.
I have not found any error/exceptions in resourcemanager logs. I am suspecting that there is some issue with Zookeeper. I have three zookeeper services. below are the logs of resource manager and two zookeeper services. Please Help with it.
Resource Manager Logs:
2018-05-30 09:08:38,058 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir 2018-05-30 09:08:38,058 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA> 2018-05-30 09:08:38,058 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.name=Linux 2018-05-30 09:08:38,058 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.arch=amd64 2018-05-30 09:08:38,059 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:os.version=3.10.0-693.el7.x86_64 2018-05-30 09:08:38,059 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.name=yarn 2018-05-30 09:08:38,059 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.home=/home/yarn 2018-05-30 09:08:38,059 INFO zookeeper.ZooKeeper (Environment.java:logEnv(100)) - Client environment:user.dir=/usr/hdp/2.6.3.0-235/hadoop-yarn 2018-05-30 09:08:38,060 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=hdp01.mydomain.com:2181,hdp03.mydomain.com:2181,hdp02.mydomain.com:2181 sessionTimeout=10000 watcher=null 2018-05-30 09:08:38,179 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:createConnection(1276)) - Created new ZK connection 2018-05-30 09:08:38,200 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server hdp02.mydomain.com/192.168.3.19:2181. Will not attempt to authenticate using SASL (unknown error) 2018-05-30 09:08:38,220 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(864)) - Socket connection established, initiating session, client: /192.168.3.18:56340, server: hdp02.mydomain.com/192.168.3.19:2181 2018-05-30 09:08:38,279 INFO zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1279)) - Session establishment complete on server hdp02.mydomain.com/192.168.3.19:2181, sessionid = 0x263b11a44120001, negotiated timeout = 10000 2018-05-30 09:08:38,495 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:run(359)) - Fencing node /rmstore/ZKRMStateRoot/RM_ZK_FENCING_LOCK doesn't exist to delete 2018-05-30 09:08:38,793 INFO resourcemanager.ResourceManager (ResourceManager.java:serviceStart(597)) - Recovery started 2018-05-30 09:08:38,851 INFO recovery.RMStateStore (RMStateStore.java:checkVersion(639)) - Loaded RM state version info 1.2
Zookeeper 1 logs:
2018-05-30 09:08:38,208 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.3.18:56340 2018-05-30 09:08:38,246 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /192.168.3.18:56340 2018-05-30 09:08:38,272 - INFO [CommitProcessor:2:ZooKeeperServer@617] - Established session 0x263b11a44120001 with negotiated timeout 10000 for client /192.168.3.18:56340 2018-05-30 09:08:38,285 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x1 zxid:0x800000042 txntype:-1 reqpath:n/a Error Path:/rmstore Error:KeeperErrorCode = NodeExists for /rmstore 2018-05-30 09:08:38,344 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x2 zxid:0x800000043 txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot 2018-05-30 09:08:38,447 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@590] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:multi cxid:0x4 zxid:0x800000045 txntype:-1 reqpath:n/a aborting remaining multi ops. Error Path:/rmstore/ZKRMStateRoot/RM_ZK_FENCING_LOCK Error:KeeperErrorCode = NoNode for /rmstore/ZKRMStateRoot/RM_ZK_FENCING_LOCK 2018-05-30 09:08:38,510 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x5 zxid:0x800000046 txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot/RMAppRoot Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot/RMAppRoot 2018-05-30 09:08:38,535 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x6 zxid:0x800000047 txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot/RMDTSecretManagerRoot Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot 2018-05-30 09:08:38,602 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x7 zxid:0x800000048 txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTMasterKeysRoot Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTMasterKeysRoot 2018-05-30 09:08:38,666 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x8 zxid:0x800000049 txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDelegationTokensRoot Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDelegationTokensRoot 2018-05-30 09:08:38,724 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0x9 zxid:0x80000004a txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTSequentialNumber Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTSequentialNumber 2018-05-30 09:08:38,765 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@643] - Got user-level KeeperException when processing sessionid:0x263b11a44120001 type:create cxid:0xa zxid:0x80000004b txntype:-1 reqpath:n/a Error Path:/rmstore/ZKRMStateRoot/AMRMTokenSecretManagerRoot Error:KeeperErrorCode = NodeExists for /rmstore/ZKRMStateRoot/AMRMTokenSecretManagerRoot 2018-05-30 09:08:45,736 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.3.19:38248 2018-05-30 09:08:45,736 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@827] - Processing ruok command from /192.168.3.19:38248 2018-05-30 09:08:45,767 - INFO [Thread-35:NIOServerCnxn@1008] - Closed socket connection for client /192.168.3.19:38248 (no session established for client) 2018-05-30 09:09:02,037 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x263b11a44120001 due to java.io.IOException: Connection reset by peer 2018-05-30 09:09:02,038 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.3.18:56340 which had sessionid 0x263b11a44120001 2018-05-30 09:09:12,009 - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x263b11a44120001, timeout of 10000ms exceeded 2018-05-30 09:09:12,017 - INFO [ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@492] - Processed session termination for sessionid: 0x263b11a44120001
Zookeeper 2 logs:
2018-05-30 09:08:46,033 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.3.18:33482 2018-05-30 09:08:46,033 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@827] - Processing ruok command from /192.168.3.18:33482 2018-05-30 09:08:46,083 - INFO [Thread-20:NIOServerCnxn@1008] - Closed socket connection for client /192.168.3.18:33482 (no session established for client) 2018-05-30 09:09:46,017 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.3.18:33584 2018-05-30 09:09:46,018 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@827] - Processing ruok command from /192.168.3.18:33584 2018-05-30 09:09:46,052 - INFO [Thread-21:NIOServerCnxn@1008] - Closed socket connection for client /192.168.3.18:33584 (no session established for client)
Please help with it. Thanks in Advance.
-Paramesh.
Created 06-20-2018 12:16 PM
It got resolved, once move the RM to another node in the cluster.
Created 06-06-2018 10:52 PM
Hey @Paramesh malla !
Could check if your yarn.resourcemanager.recovery.enabled is true?
Created 06-14-2018 06:00 AM
Hi Vinicius, yes it is already enabled. I am still having this issue.
Created 06-20-2018 12:16 PM
It got resolved, once move the RM to another node in the cluster.
Created 12-17-2018 04:05 PM
Can you please explain the solution in detail?