Support Questions
Find answers, ask questions, and share your expertise

YARN Service Docker Containers Restarting

Highlighted

YARN Service Docker Containers Restarting

Cloudera Employee

When I launch a dockerized yarn service, the containers are being removed and restarted after ~13 seconds. This repeats 20+ times before a container eventually is able to stay up. Here are entries from the RM log where it seems to be unable to find the container.

2018-12-13 15:46:34,426 INFO  rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e04_1544715810515_0001_01_000005 Container Transitioned from ALLOCATED to ACQUIRED2018-12-13 15:46:34,449 INFO  rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e04_1544715810515_0001_01_000005 Container Transitioned from ACQUIRED to RUNNING2018-12-13 15:46:35,431 INFO  scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updatePendingResources(367)) - checking for deactivate of application :application_1544715810515_00012018-12-13 15:46:48,522 INFO  rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e04_1544715810515_0001_01_000005 Container Transitioned from RUNNING to COMPLETED2018-12-13 15:46:50,409 INFO  zookeeper.ReadOnlyZKClient (ReadOnlyZKClient.java:run(315)) - 0x0cc62a3b no activities for 60000 ms, close active connection. Will reconnect next time when there are new requests.2018-12-13 15:46:50,479 INFO  scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:releaseContainers(742)) - container_e04_1544715810515_0001_01_000005 doesn't exist. Add the container to the release request cache as it maybe on recovery.2018-12-13 15:46:50,479 INFO  scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:completedContainer(669)) - Container container_e04_1544715810515_0001_01_000005 completed with event RELEASED, but corresponding RMContainer doesn't exist.

1 REPLY 1
Highlighted

Re: YARN Service Docker Containers Restarting

Rising Star

Can you check which node container_e04_1544715810515_0001_01_000005 got assigned to and get the NM logs from that host during this failure window?