Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Highlighted

how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Explorer

Folks,

I have a serious problem I can exactly locate the problem but I have no idea what is the solution for that:

Simply I just setup a multi-node cluster with CDH5/YARN everything should be fine, I just followed the CloudERA documentation to have the cluster running.

 

I can not have hadoop-yarn-nodemanager runs on any of datanodes at all, once i start it, it stops again.

 

[root@hdmachine3 conf.my_cluster]# service hadoop-yarn-nodemanager status
Hadoop nodemanager is dead and pid file exists             [FAILED]

[root@hdmachine3 conf.my_cluster]# service hadoop-yarn-nodemanager status
Hadoop nodemanager is dead and pid file exists             [FAILED]

 

I tried to start it with #yarn nodemanager, here is what I got this output

 

14/08/03 23:49:10 INFO mortbay.log: Extract jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.3.0-cdh5.1.0.jar!/webapps/node to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
14/08/03 23:49:10 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8042
14/08/03 23:49:10 INFO webapp.WebApps: Web app /node started at 8042
14/08/03 23:49:11 INFO webapp.WebApps: Registered webapp guice modules
14/08/03 23:49:11 INFO client.RMProxy: Connecting to ResourceManager at hdmachine1.example.com/128.243.29.224:8031
14/08/03 23:49:11 INFO nodemanager.NodeStatusUpdaterImpl: Registering with RM using finished containers :[]
14/08/03 23:49:12 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:13 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:14 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:15 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:16 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:24 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:25 INFO ipc.Client: Retrying connect to server: hdmachine1.example.com/128.243.29.224:8031. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
14/08/03 23:49:43 ERROR nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:756)
	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
	at org.apache.hadoop.ipc.Client.call(Client.java:1362)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy23.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy24.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:247)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:179)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:352)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:398)
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1461)
	at org.apache.hadoop.ipc.Client.call(Client.java:1380)
	... 19 more
14/08/03 23:49:43 INFO service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:185)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:352)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:398)
Caused by: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:756)
	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
	at org.apache.hadoop.ipc.Client.call(Client.java:1362)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy23.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy24.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:247)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:179)
	... 6 more
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1461)
	at org.apache.hadoop.ipc.Client.call(Client.java:1380)
	... 19 more
14/08/03 23:49:43 INFO service.AbstractService: Service NodeManager failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:185)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:352)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:398)
Caused by: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:756)
	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
	at org.apache.hadoop.ipc.Client.call(Client.java:1362)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy23.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy24.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:247)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:179)
	... 6 more
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1461)
	at org.apache.hadoop.ipc.Client.call(Client.java:1380)
	... 19 more
14/08/03 23:49:43 INFO mortbay.log: Stopped SelectChannelConnector@0.0.0.0:8042
14/08/03 23:49:43 INFO ipc.Server: Stopping server on 45300
14/08/03 23:49:43 INFO ipc.Server: Stopping IPC Server listener on 45300
14/08/03 23:49:43 INFO logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit
14/08/03 23:49:43 INFO ipc.Server: Stopping IPC Server Responder
14/08/03 23:49:43 WARN monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
14/08/03 23:49:43 INFO ipc.Server: Stopping server on 8040
14/08/03 23:49:43 INFO ipc.Server: Stopping IPC Server listener on 8040
14/08/03 23:49:43 INFO ipc.Server: Stopping IPC Server Responder
14/08/03 23:49:43 INFO localizer.ResourceLocalizationService: Public cache exiting
14/08/03 23:49:43 INFO impl.MetricsSystemImpl: Stopping NodeManager metrics system...
14/08/03 23:49:43 INFO impl.MetricsSystemImpl: NodeManager metrics system stopped.
14/08/03 23:49:43 INFO impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
14/08/03 23:49:43 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:185)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:352)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:398)
Caused by: java.net.NoRouteToHostException: No Route to Host from  hdmachine3.example.com/128.243.29.227 to hdmachine1.example.com:8031 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:756)
	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
	at org.apache.hadoop.ipc.Client.call(Client.java:1362)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy23.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy24.registerNodeManager(Unknown Source)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:247)
	at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:179)
	... 6 more
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1461)
	at org.apache.hadoop.ipc.Client.call(Client.java:1380)
	... 19 more
14/08/03 23:49:43 INFO nodemanager.NodeManager: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hdmachine3.example.com/128.243.29.227
************************************************************/

 

it is pretty clear that the nodes can't find their way to the ResourceManager, But I don't know what else I should do, I followed the troubleshooting tips and guidelines but nothing cameup.

 

Can anyone suggest me anything to deal with this I am sure no problem at all in network, hostnames, .... SeLinux, iptables, all of such staff are local so I do not need any security issues to wory about?

 

Thanks in advance,

Cheers,

 

14 REPLIES 14

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Expert Contributor

So the resource manager is definately up and running?

 

I know you say there shouldn't be anything wrong network wise, can you verify anyway? i know i've done that in the past, swear up and down it's not the network and then actually go to verify and sure enough something is blocking. 

 

in rhel6 you can check with the following from hdmachine3.example.com

 

nc -z 128.243.29.224 8031

 

i think other flavors it may be something like:

 

ncat 128.243.29.224 8031

 

other random things:  is dns correct? is hdmachine1.example.com 128.243.29.227 ?

 

also, the 128.x.x.x range is not in the private IP space so your request could be getting routed to the internet?

 

 

 

 

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Explorer

Thanks for your reply,

Absolutely resouce manager is running on NameNode

 

[root@hdmachine1 ncdc]# service hadoop-yarn-resourcemanager status
Hadoop resourcemanager is running                          [  OK  ]

 and for the connection from hdmachine3 to resource-Manager

 

[root@hdmachine3 conf.my_cluster]# nc -z 128.243.29.224 8031
Connection to 128.243.29.224 8031 port [tcp/*] succeeded!

 And I even droped all iptables rules and made it looks like this on all including the resource manager node

 

[root@hdmachine1 ncdc]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

 

Any suggesstions :D

 

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Expert Contributor

very strange indeed!

 

is there anything helpful in the resource manager log when you attempt to start up the node manager?

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Explorer

Indeed,

I am surving since about a week on the internet, I thought CDH would have much better logging system!

 

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Explorer

anyway, would be interesting to know how "hdmachine1.example.com" is resolved to on your hdmachine3 machine. Its indeed very strange to have a public IP range there and probably not what you want. So maybe you could try the nc with the name again and do a "getent hosts hdmachine1". And check that all hosts are really in a domain "example.com". 

 

Does it work if you use IP addresses instead of hostnames in the yarn-site.xml?

 

BR

Marc

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Explorer

Actually it simply works because I set /etc/hosts file with all names and IPs.

Regarding the yarn-site.xml using the IP instead of Hostname is still same problem. Nodemanager once I start it, shuts down again. I am getting crazy :@

 

 

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Expert Contributor

could you post your hostfile entries? i'm wondering if maybe you are specifying the short names in the hostfile and when looking up the fqdn it isn't defined in the hostfile and therefore isn't able to make the connection. 

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Explorer
[root@hdmachine1 etc]#cat /etc/hosts

128.243.29.224  hdmachine1.example.com  node1
128.243.29.226  hdmachine3.example.com  node3
128.243.29.227  hdmachine4.example.com  node4
128.243.29.228  hdmachine5.example.com  node5
128.243.29.229  hdmachine6.example.com  node6
128.243.29.230  hdmachine7.example.com  node7
128.243.29.231  hdmachine8.example.com  node8
128.243.29.232  hdmachine9.example.com  node9
128.243.29.233  hdmachine10.example.com  node10
128.243.29.234  hdmachine11.example.com  node11
128.243.29.235  hdmachine12.example.com  node12

 

Re: how to start hadoop-yarn-nodemanager on multinode cluster runs CDH5 with YARN

Expert Contributor

so from your log:

 

hdmachine3.example.com/128.243.29.227

 

but from your host file:

 

128.243.29.226  hdmachine3.example.com  node3

 

looks like you'll need to make sure your hostfiles are the same for each server and restart every process. You'll need to restart since java will cache dns lookups.