Support Questions

Find answers, ask questions, and share your expertise

ResourceManager does not start

avatar
New Contributor

I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.

However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:

<code>resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000

App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping, tested via ip and via dns-names) so that shouldn't be the problem.

Hopefully one can help me. I'm really new to this topic and not really familiar with the system.

4 REPLIES 4

avatar
Master Mentor

@Alexander E

As you mentioned that the Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other nodes.

- But is the NameNode healthy ? I means do you see any error in the NameNode log?

- Sometimes even though the NameNode is running but it is running outofmemory (or OS resource unavailability/ like too many open sockets ...etc) and hence not able to respond. So better to check the Name Node log.

avatar
New Contributor

Ambari says the namenode is running and healthy:

12369-namenode.png

So i restarted the namenode and wartched the log. There is an "RetriableException: NameNode still not started"

2017-02-12 16:24:58,249 INFO  namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = node0/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.5.3.0-37
STARTUP_MSG:   classpath = ...
STARTUP_MSG:   build = git@github.com:hortonworks/hadoop.git -r 9828acfdec41a121f0121f556b09e2d112259e92; compiled by 'j
enkins' on 2016-11-29T18:37Z
STARTUP_MSG:   java = 1.8.0_77
************************************************************/
2017-02-12 16:24:58,268 INFO  namenode.NameNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM,
HUP, INT]
2017-02-12 16:24:58,271 INFO  namenode.NameNode (NameNode.java:createNameNode(1600)) - createNameNode []
2017-02-12 16:24:58,709 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-met
rics2.properties
...


2017-02-12 16:25:01,416 INFO  ipc.Server (Server.java:run(1045)) - IPC Server Responder: starting
2017-02-12 16:25:01,417 INFO  ipc.Server (Server.java:run(881)) - IPC Server listener on 8020: starting
2017-02-12 16:25:01,429 INFO  namenode.NameNode (NameNode.java:startCommonServices(876)) - NameNode RPC up at: node0/127
.0.1.1:8020
2017-02-12 16:25:01,430 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1130)) - Starting services re
quired for active state
2017-02-12 16:25:01,436 INFO  blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting
 CacheReplicationMonitor with interval 30000 milliseconds
2017-02-12 16:25:03,040 INFO  ipc.Server (Server.java:logException(2401)) - IPC Server handler 0 on 8020, call org.apach
e.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 127.0.0.1:42972 Call#687 Retry#0
org.apache.hadoop.ipc.RetriableException: NameNode still not started
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:2057)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1414)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServer
SideTranslatorPB.java:118)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(Dat
anodeProtocolProtos.java:29064)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
2017-02-12 16:25:03,080 INFO  fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(224)) - The configured checkpoint in
terval is 0 minutes. Using an interval of 360 minutes that is used for deletion instead
...

I cannot poste the complete log so you can find it here: complete log

avatar
Super Guru
@Alexander E

Please login to ambari UI and check if the namenode service is started or not.

Since from the logs below it seems your namenode is down -

 Failed to connect to node0 port 50070: connection refused

Also it might happen sometimes, the namenode takes long time to start and before that if the Yarn is started it tried to connect namenode (which is still getting up)

For such scenario's make sure namenode is up and then restart YARN service again.

avatar
New Contributor

Jay SenSharma also said it could be a problem with the namenode so I posted the namenode.log where it says: "RetriableException: NameNode still not started" https://community.hortonworks.com/comments/83058/view.html