Created on 12-25-201610:30 AM - edited on 02-05-202012:13 AM by gzigldrum
SYMPTOMS: Due to both Resource manager getting active simultaneously , all node managers crash. Errors visible in RM logs are as follows:
2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED
2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interruptedjava.lang.InterruptedExceptionat java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
atjava.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
While AsyncDispatcher is in hung state, we keep getting below errors:-
2015-06-27 20:08:35,926 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140)) - AsyncDispatcher is draining to stop, igonring any new events.
2015-06-27 20:08:36,926 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2015-06-27 20:08:37,927 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
ROOT CAUSE: This a known issue reported in YARN-3878
WORKAROUND: Stop one resource manager and start another manually to resume services.