Support Questions

Find answers, ask questions, and share your expertise

unexpected exits in node manager

Expert Contributor

I just increased Java heap memory for name node and secondary name node to 5Gib. It was set to something like 718 mb before. I started getting container killed with error code 137 so increased heap memory.

After increasing heap memory and restarting services, I am getting unexpected exits for node manager. How do I debug and fix this?

How do I check how much memory is available for namenode and secondary namenode?

Edit: I reverted heap memory for namenode and secondary namenode to 784 MiB and yet it is exiting unexpectedly. PLEASE SUGGEST.

Because of Nodemanager exiting, my sqoop job starts reporting error: output-dir already exists. I believe it is because it tried to create it again when node manager is back

11 REPLIES 11

Rising Star

I guess you don't have enough memory available in node for all services. You can confirm/debug further by checking logs in /var/log/messages and also /var/log/hadoop-yarn/yarn folder .

Expert Contributor

How do I check how much memory is available for namenode and secondary namenode?

Rising Star

You can check total memory usage in the node by using 'top' command and check for namenode pid. The RES column should give an approximate value of memory used by the process.

If you want to check java heap usage of namenode, then you can try

jmap -heap <pid>

Expert Contributor

@Tarun ParimiThanks. I get:

37706 yarn 20 0 323600 17248 32 S 802.7 0.0 37577:11 suppoie

for yarn.

17248 is value in RES column.

I actually wanted to check what value I can assign to java heap memory namenode and secondary namenode so that it won't exit. server has 64GB memory (aws m4*4) centos 7 machine. On the machine, I have , HDFS, Hive, Hue, Oozie,sentry,sqoop and yarn

Expert Contributor

Node Manager is still continuously exiting unexpectedly

Expert Contributor

Edit: I reverted heap memory for namenode and secondary namenode to 784 MiB and yet it is exiting unexpectedly. PLEASE SUGGEST.

Rising Star

Have you checked /var/log/messages? Can you also check and share the nodemanager log which is present in /var/log/hadoop-yarn/yarn folder?

Expert Contributor
Apr 17 01:05:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Started Session 133520 of user yarn.
Apr 17 01:06:01 ip-172-31-4-192 systemd: Starting Session 133520 of user yarn.
Apr 17 01:06:04 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:06:04 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:06:18 ip-172-31-4-192 kernel: net_ratelimit: 14 callbacks suppressed
Apr 17 01:07:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:07:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:07:01 ip-172-31-4-192 systemd: Started Session 133521 of user yarn.
Apr 17 01:07:01 ip-172-31-4-192 systemd: Starting Session 133521 of user yarn.
Apr 17 01:07:05 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:07:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Started Session 133522 of user yarn.
Apr 17 01:08:01 ip-172-31-4-192 systemd: Starting Session 133522 of user yarn.
Apr 17 01:08:04 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:08:04 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Started Session 133523 of user yarn.
Apr 17 01:09:01 ip-172-31-4-192 systemd: Starting Session 133523 of user yarn.
Apr 17 01:09:05 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:09:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Started Session 133524 of user rstudio.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting Session 133524 of user rstudio.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Started Session 133525 of user root.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting Session 133525 of user root.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Started Session 133526 of user yarn.
Apr 17 01:10:01 ip-172-31-4-192 systemd: Starting Session 133526 of user yarn.
Apr 17 01:10:05 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:10:05 ip-172-31-4-192 systemd: Stopping user-986.slice.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Created slice user-986.slice.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Starting user-986.slice.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Started Session 133527 of user yarn.
Apr 17 01:11:01 ip-172-31-4-192 systemd: Starting Session 133527 of user yarn.
Apr 17 01:11:04 ip-172-31-4-192 systemd: Removed slice user-986.slice.
Apr 17 01:11:04 ip-172-31-4-192 systemd: Stopping user-986.slice.
[centos@ip-172-31-4-192 ~]$ 



From hadoop-yarn folder,

2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_110997
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111000
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111010
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111011
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111013
2018-04-17 01:02:23,162 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: FINISH_APPLICATION sent to absent application application_1519975798846_111025

Expert Contributor

Rising Star

@Simran Kaur The log snippets don't indicate the problem you are facing. Can you post or attach any ERROR or failure messages in the log?

Expert Contributor

@Tarun Parimi Here's the warning I get:

Apr 17, 3:03:05.946 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting path : /yarn/container-logs/application_1523905807460_4113/container_1523905807460_4113_01_000001/stderr
Apr 17, 3:03:05.963 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting path : /yarn/container-logs/application_1523905807460_4113
Apr 17, 3:03:07.091 PMWARNorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorExit code from container container_1523905807460_4129_01_000001 is : 137
Apr 17, 3:03:07.091 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerContainer container_1523905807460_4129_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
Apr 17, 3:03:07.091 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunchCleaning up container container_1523905807460_4129_01_000001
Apr 17, 3:03:07.111 PMINFOorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutorDeleting absolute path : /yarn/nm/usercache/hue/appcache/application_1523905807460_4129/container_1523905807460_4129_01_000001
Apr 17, 3:03:07.112 PMWARNorg.apache.hadoop.yarn.server.nodemanager.NMAuditLoggerUSER=hue	OPERATION=Container Finished - Failed	TARGET=ContainerImpl	RESULT=FAILURE	DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE	APPID=application_1523905807460_4129	CONTAINERID=container_1523905807460_4129_01_000001
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerContainer container_1523905807460_4129_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationRemoving container_1523905807460_4129_01_000001 from application application_1523905807460_4129
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImplConsidering container container_1523905807460_4129_01_000001 for log-aggregation
Apr 17, 3:03:07.112 PMINFOorg.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesGot event CONTAINER_STOP for appId application_1523905807460_4129


Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.