Member since
03-15-2023
2
Posts
0
Kudos Received
0
Solutions
03-21-2023
02:24 AM
Hi @JimHalfpenny this fatal down happen when it's still running and I try check zookeeper and find this message: 2022-12-07 02:39:07,482 [myid:3] - INFO [CommitProcessor:3:LearnerSessionTracker@116] - Committing global session 0x1000048d7880001 2023-02-25 08:29:09,649 [myid:3] - INFO [NIOWorkerThread-2:Learner@158] - Revalidating client: 0x1000048d7880001 2023-02-25 20:27:42,921 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@163] - Submitting global closeSession request for session 0x1000048d7880001 I'm not sure for zookeeper session timeout like this case. https://community.cloudera.com/t5/Support-Questions/Zookeeper-average-client-session-timeout/td-p/289061
... View more
03-16-2023
12:41 AM
Hi everyone, Please find root cause and why yarn resourcemanager not autofailover on my cluster ? I get message ERROR on fatal event from my yarn-resource.log on yarn rm node (Active) below: 2023-02-25 08:04:03,805 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1670355762504_0824_01_000007 Container Transitioned from ACQUIRED to RELEASED 2023-02-25 08:29:08,810 WARN org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6669ms for sessionid 0x1000048d7880001 2023-02-25 08:29:08,810 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 6669ms for sessionid 0x1000048d7880001, closing socket connection and attempting reconnect 2023-02-25 08:29:08,911 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode... 2023-02-25 08:29:08,911 WARN org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService: Lost contact with Zookeeper. Transitioning to standby in 10000 ms if connection is not reestablished. 2023-02-25 08:29:09,647 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server yarn-rm1.hostname/10.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2023-02-25 08:29:09,647 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to yarn-rm1.hostname/10.x.x.x:2181, initiating session 2023-02-25 08:29:09,686 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server yarn-rm1.hostname/10.x.x.x:2181, sessionid = 0x1000048d7880001, negotiated timeout = 10000 2023-02-25 08:29:09,686 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. 2023-02-25 08:29:09,698 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/app/hadoop-3.2.2/etc/hadoop/yarn-site.xml 2023-02-25 08:29:09,700 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2023-02-25 08:29:09,707 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning. 2023-02-25 08:29:09,716 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032 2023-02-25 08:29:09,727 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032 2023-02-25 08:29:09,730 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030 2023-02-25 08:29:09,734 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2023-02-25 08:29:09,737 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030 2023-02-25 08:29:09,737 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2023-02-25 08:29:09,740 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031 2023-02-25 08:29:09,748 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2023-02-25 08:29:09,748 ERROR org.apache.hadoop.yarn.event.EventDispatcher: Returning, interrupted : java.lang.InterruptedException 2023-02-25 08:29:09,748 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager thread interrupted 2023-02-25 08:29:09,749 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events. 2023-02-25 08:29:09,749 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor thread interrupted 2023-02-25 08:29:09,750 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031 2023-02-25 08:29:09,751 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events. 2023-02-25 08:29:09,751 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor thread interrupted 2023-02-25 08:29:09,752 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted 2023-02-25 08:29:09,753 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2023-02-25 08:29:09,751 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted 2023-02-25 08:29:09,755 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system... 2023-02-25 08:29:09,755 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped. 2023-02-25 08:29:09,756 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted 2023-02-25 08:29:09,758 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete. 2023-02-25 08:29:09,758 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, ignoring any new events. 2023-02-25 08:29:09,759 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher 2023-02-25 08:29:09,761 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms 2023-02-25 08:29:09,761 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms 2023-02-25 08:29:09,761 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms 2023-02-25 08:29:09,762 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 2023-02-25 08:29:09,762 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 2023-02-25 08:29:09,762 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler 2023-02-25 08:29:09,763 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.event.EventDispatcher 2023-02-25 08:29:09,763 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 2023-02-25 08:29:09,763 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher 2023-02-25 08:29:09,763 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher 2023-02-25 08:29:09,767 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties and my config on yarn-site.xml on resourcemanager node <configuration> <property> <name>yarn.nodemanager.local-dirs</name> <value>/data/nm-local-dir</value> </property> <property> <name>yarn.node-labels.enabled</name> <value>false</value> </property> <property> <name>yarn.node-attribute.fs-store.root-dir</name> <value>file:///app/tmp/hadoop-yarn-yarn/node-attribute</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>yarn-rm1.hostname</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>32768</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>6</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>32768</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>6</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.resourcemanager.principal</name> <value>yarn/yarn-rm1.hostname@MYREALM</value> </property> <property> <name>yarn.resourcemanager.keytab</name> <value>/app/keytabs/hdfs.keytab</value> </property> <property> <name>yarn.nodemanager.principal</name> <value>yarn/yarn-nm1.hostname@MYREALM</value> </property> <property> <name>yarn.nodemanager.keytab</name> <value>/app/keytabs/hdfs.keytab</value> </property> <property> <name>yarn.http.policy</name> <value>HTTPS_ONLY</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address</name> <value>0.0.0.0:8089</value> </property> <property> <name>yarn.nodemanager.webapp.https.address</name> <value>0.0.0.0:8090</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-rm</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>yarn-rm1.hostname</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>yarn-rm2.hostname</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>yarn-rm1.hostname:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>yarn-rm2.hostname:8088</value> </property> <property> <name>hadoop.zk.address</name> <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value> </property> more detail: I checked more environment zookeepers ,hdfs and network connection status are good health. Anyone, Can check and suggest more details for set yarn-site.xml and please provide what should I fix in this case? Thank you.
... View more
Labels:
- Labels:
-
Apache YARN
-
MapReduce