SYMPTOMS: Due to both Resource manager getting active simultaneously , all node managers crash. Errors visible in RM logs are as follows:
2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interruptedjava.lang.InterruptedExceptionat java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) atjava.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838)
While AsyncDispatcher is in hung state, we keep getting below errors:-
2015-06-27 20:08:35,926 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(140)) - AsyncDispatcher is draining to stop, igonring any new events. 2015-06-27 20:08:36,926 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING 2015-06-27 20:08:37,927 INFO [main] event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(144)) - Waiting for AsyncDispatcher to drain. Thread state is :WAITING
ROOT CAUSE: This a known issue reported in YARN-3878
WORKAROUND: Stop one resource manager and start another manually to resume services.
REFERENCE:
User | Count |
---|---|
763 | |
379 | |
316 | |
309 | |
270 |