Reply
RK
Explorer
Posts: 9
Registered: ‎09-24-2015

Container killed with NMToken errors

[ Edited ]

Hi,

 

We are trying to move from MR1 to YARN, its in version 5.5.1. We are seeing many applications mapreduce, pig getting killed due to NMtoken errors. It almost looks like AM is getting killed, and as the fair scheduler queues are getting filled this started happening. We are trying to simulate this in test, still not succesfull.

 

logs for one container.

 

2018-03-13 17:12:45,869 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_1520985320295_0207_01_000002 to attempt_1520985320295_0207_m_000000_0

 

2018-03-13 17:12:46,066 ERROR [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1520985320295_0207_01_000002 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for dm520.sjc2.turn.com:8041
        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244)
        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:412)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:142)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:373)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

 

2018-03-13 17:12:46,071 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1520985320295_0207_m_000000_0: Container launch failed for container_1520985320295_0207_01_000002 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for dm520.sjc2.turn.com:8041
        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244)
        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:412)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:142)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:373)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

 

2018-03-13 17:12:47,017 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1520985320295_0207_01_000002

2018-03-13 17:12:47,017 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1520985320295_0207_01_000002

 

RK
Explorer
Posts: 9
Registered: ‎09-24-2015

Re: Container killed with NMToken errors

This was issue with thirdparty monitoring rpm and configs that were not backward compatible. It was reproducable and fixed.

Announcements