Created 02-10-2017 11:12 AM
I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.
However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:
<code>resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000
App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping, tested via ip and via dns-names) so that shouldn't be the problem.
Hopefully one can help me. I'm really new to this topic and not really familiar with the system.
Created 02-10-2017 11:16 AM
As you mentioned that the Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other nodes.
- But is the NameNode healthy ? I means do you see any error in the NameNode log?
- Sometimes even though the NameNode is running but it is running outofmemory (or OS resource unavailability/ like too many open sockets ...etc) and hence not able to respond. So better to check the Name Node log.
Created on 02-12-2017 03:38 PM - edited 08-18-2019 03:50 AM
Ambari says the namenode is running and healthy:
So i restarted the namenode and wartched the log. There is an "RetriableException: NameNode still not started"
2017-02-12 16:24:58,249 INFO namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: user = hdfs STARTUP_MSG: host = node0/127.0.1.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.7.3.2.5.3.0-37 STARTUP_MSG: classpath = ... STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r 9828acfdec41a121f0121f556b09e2d112259e92; compiled by 'j enkins' on 2016-11-29T18:37Z STARTUP_MSG: java = 1.8.0_77 ************************************************************/ 2017-02-12 16:24:58,268 INFO namenode.NameNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT] 2017-02-12 16:24:58,271 INFO namenode.NameNode (NameNode.java:createNameNode(1600)) - createNameNode [] 2017-02-12 16:24:58,709 INFO impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-met rics2.properties ... 2017-02-12 16:25:01,416 INFO ipc.Server (Server.java:run(1045)) - IPC Server Responder: starting 2017-02-12 16:25:01,417 INFO ipc.Server (Server.java:run(881)) - IPC Server listener on 8020: starting 2017-02-12 16:25:01,429 INFO namenode.NameNode (NameNode.java:startCommonServices(876)) - NameNode RPC up at: node0/127 .0.1.1:8020 2017-02-12 16:25:01,430 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1130)) - Starting services re quired for active state 2017-02-12 16:25:01,436 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting CacheReplicationMonitor with interval 30000 milliseconds 2017-02-12 16:25:03,040 INFO ipc.Server (Server.java:logException(2401)) - IPC Server handler 0 on 8020, call org.apach e.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 127.0.0.1:42972 Call#687 Retry#0 org.apache.hadoop.ipc.RetriableException: NameNode still not started at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:2057) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1414) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServer SideTranslatorPB.java:118) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(Dat anodeProtocolProtos.java:29064) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) 2017-02-12 16:25:03,080 INFO fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(224)) - The configured checkpoint in terval is 0 minutes. Using an interval of 360 minutes that is used for deletion instead ...
I cannot poste the complete log so you can find it here: complete log
Created 02-10-2017 12:33 PM
Please login to ambari UI and check if the namenode service is started or not.
Since from the logs below it seems your namenode is down -
Failed to connect to node0 port 50070: connection refused
Also it might happen sometimes, the namenode takes long time to start and before that if the Yarn is started it tried to connect namenode (which is still getting up)
For such scenario's make sure namenode is up and then restart YARN service again.
Created 02-12-2017 03:40 PM
Jay SenSharma also said it could be a problem with the namenode so I posted the namenode.log where it says: "RetriableException: NameNode still not started" https://community.hortonworks.com/comments/83058/view.html