Support Questions

Find answers, ask questions, and share your expertise

Node manager immediately goes down after starting up via Ambari

avatar
Contributor

I am trying to start the HDFS service from Ambari. All the sub components comes up except the Nodemanager service under HDFS


I am seeing the below error message in the /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<Node_Name>.log


2019-05-17 13:36:03,850 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - NodeManager metrics system stopped.

2019-05-17 13:36:03,850 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(606)) - NodeManager metrics system shutdown complete.

2019-05-17 13:36:03,851 FATAL nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(549)) - Error starting NodeManager

org.apache.hadoop.service.ServiceStateException: java.net.BindException: Address already in use

at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.serviceInit(AuxiliaryServiceWithCustomClassLoader.java:65)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:162)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)

at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:291)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:546)

at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:594)

Caused by: java.net.BindException: Address already in use

at sun.nio.ch.Net.bind0(Native Method)

at sun.nio.ch.Net.bind(Net.java:433)

at sun.nio.ch.Net.bind(Net.java:425)

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)

at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)

at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:504)

at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1226)

at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:495)

at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:480)

at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)

at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:213)

at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355)

at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:464)

at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)

at java.lang.Thread.run(Thread.java:748)

2019-05-17 13:36:03,853 INFO nodemanager.NodeManager (LogAdapter.java:info(45)) - SHUTDOWN_MSG:

/************************************************************


Observations made

  • No process running the nodemanager port - 0.0.0.0:45454 and 8042
  • Also no nodemanager process is running - ps -ef | grep nodemanager is returning empty result


Any help would be appreciated

1 REPLY 1

avatar
Master Mentor

@ajay vembu

Here is the cause of the problem

Caused by: java.net.BindException: Address already in use

This means some process is using the default Node manager port. You will need to kill the process and your name node will start successfully do the following things

sudo lsof -i -P -n | grep LISTEN  

or

sudo netstat -ltup 

It was a simple port collision that has occurred. Can you first stop the Ambari Metrics process HMaster process at times it grabs port 45454 and this blocks the NodeManager from starting?

Please do that and revert