Member since
03-05-2018
6
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1625 | 07-26-2018 06:37 AM |
07-26-2018
06:37 AM
I have resolved the issue. All the resources were 100% utilised because of security breach. A cron job was using yarn service for resources. Resolution: I closed all public ports and ip and deleted the cron jobs from /var/spool/cron/crontabs. Fortunately it was just a test cluster and the network admin had opened the ports for a while. So don't keep any ports public in your cluster.
... View more
07-23-2018
12:18 PM
All nodemanagers go into stopped state within a couple of seconds after starting up.The nodemanager status remains active after manually starting up but still remains in stopped state.All jobs remain in accepted state.
I find the following error in nodemanager logs 2018-07-23 17:23:28,988 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e101_1532344242009_0069_01_000001
java.io.IOException: Timeout while waiting for exit code from container_e101_1532344242009_0069_01_000001
at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-07-23 17:23:28,989 WARN launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(106)) - Recovered container exited with a non-zero exit code 154
2018-07-23 17:23:28,991 INFO container.ContainerImpl (ContainerImpl.java:handle(1136)) - Container container_e101_1532344242009_0069_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2018-07-23 17:23:28,991 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(371)) - Cleaning up container container_e101_1532344242009_0069_01_000001
2018-07-23 17:23:29,006 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88)) - Unable to recover container container_e101_1532344242009_0071_01_000001
... View more
Labels:
- Labels:
-
Apache YARN
-
Cloudera Manager