Created 02-10-2017 11:12 AM
I installed Hadoop (HDP 2.5.3) on 4 VMs with Ambari (1 Ambari Server and 3 Ambari Clients; with the DNS entries server, node0, node1, node2) with HDFS, YARN, MapReduce and Zookeeper.
However, YARN doesn't want to start. When starting the Resource Manager on node1 I get the following error:
<code>resource_management.core.exceptions.ExecutionFailed: Execution of 'curl -sS -L -w '%{http_code}' -X GET 'http://node0:50070/webhdfs/v1/ats/done/?op=GETFILESTATUS&user.name=hdfs' 1>/tmp/tmpgsiRLj 2>/tmp/tmpMENUFa' returned 7. curl: (7) Failed to connect to node0 port 50070: connection refused 000
App Timeline Server and History Server on node1 don't want to start either. Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other (tried with ping, tested via ip and via dns-names) so that shouldn't be the problem.
Hopefully one can help me. I'm really new to this topic and not really familiar with the system.
Created 02-10-2017 11:16 AM
As you mentioned that the Zookeeper, NameNode, DataNode and Nodemanager on Node0 is up. The nodes can reach each other nodes.
- But is the NameNode healthy ? I means do you see any error in the NameNode log?
- Sometimes even though the NameNode is running but it is running outofmemory (or OS resource unavailability/ like too many open sockets ...etc) and hence not able to respond. So better to check the Name Node log.
Created on 02-12-2017 03:38 PM - edited 08-18-2019 03:50 AM
Ambari says the namenode is running and healthy:
So i restarted the namenode and wartched the log. There is an "RetriableException: NameNode still not started"
2017-02-12 16:24:58,249 INFO  namenode.NameNode (LogAdapter.java:info(47)) - STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = node0/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.3.2.5.3.0-37
STARTUP_MSG:   classpath = ...
STARTUP_MSG:   build = git@github.com:hortonworks/hadoop.git -r 9828acfdec41a121f0121f556b09e2d112259e92; compiled by 'j
enkins' on 2016-11-29T18:37Z
STARTUP_MSG:   java = 1.8.0_77
************************************************************/
2017-02-12 16:24:58,268 INFO  namenode.NameNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM,
HUP, INT]
2017-02-12 16:24:58,271 INFO  namenode.NameNode (NameNode.java:createNameNode(1600)) - createNameNode []
2017-02-12 16:24:58,709 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-met
rics2.properties
...
2017-02-12 16:25:01,416 INFO  ipc.Server (Server.java:run(1045)) - IPC Server Responder: starting
2017-02-12 16:25:01,417 INFO  ipc.Server (Server.java:run(881)) - IPC Server listener on 8020: starting
2017-02-12 16:25:01,429 INFO  namenode.NameNode (NameNode.java:startCommonServices(876)) - NameNode RPC up at: node0/127
.0.1.1:8020
2017-02-12 16:25:01,430 INFO  namenode.FSNamesystem (FSNamesystem.java:startActiveServices(1130)) - Starting services re
quired for active state
2017-02-12 16:25:01,436 INFO  blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(161)) - Starting
 CacheReplicationMonitor with interval 30000 milliseconds
2017-02-12 16:25:03,040 INFO  ipc.Server (Server.java:logException(2401)) - IPC Server handler 0 on 8020, call org.apach
e.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 127.0.0.1:42972 Call#687 Retry#0
org.apache.hadoop.ipc.RetriableException: NameNode still not started
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:2057)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1414)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServer
SideTranslatorPB.java:118)
        at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(Dat
anodeProtocolProtos.java:29064)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
2017-02-12 16:25:03,080 INFO  fs.TrashPolicyDefault (TrashPolicyDefault.java:<init>(224)) - The configured checkpoint in
terval is 0 minutes. Using an interval of 360 minutes that is used for deletion instead
...
I cannot poste the complete log so you can find it here: complete log
Created 02-10-2017 12:33 PM
Please login to ambari UI and check if the namenode service is started or not.
Since from the logs below it seems your namenode is down -
Failed to connect to node0 port 50070: connection refused
Also it might happen sometimes, the namenode takes long time to start and before that if the Yarn is started it tried to connect namenode (which is still getting up)
For such scenario's make sure namenode is up and then restart YARN service again.
Created 02-12-2017 03:40 PM
Jay SenSharma also said it could be a problem with the namenode so I posted the namenode.log where it says: "RetriableException: NameNode still not started" https://community.hortonworks.com/comments/83058/view.html
 
					
				
				
			
		
