Support Questions
Find answers, ask questions, and share your expertise

Amabri HDFS DataNodes Status always only 1 live

Highlighted

Amabri HDFS DataNodes Status always only 1 live

Explorer

Hi, we are using Ambari 2.7.3 HDFS 2.7.3.

We a cluster with three data nodes connected to one mater. DataNode and NodeManager are correctly started on each data node, however there is always only 1 live data nodes.

I don't see any connection refuse or lost heartbeats situation from slaves to master. When I do hdfs dfsadmin -report, it equally randomly show one of the three as live node, even after I do refreshNodes.

I can't see where do I miss or misconfigured. Can someone help?

Thanks!

5 REPLIES 5
Highlighted

Re: Amabri HDFS DataNodes Status always only 1 live

Contributor
@Meng Meng

Live datanode status comes from namenode. It seems like you have only one DN running and other 2 DNs are down. You can confirm this by accessing namenode UI and clicking on datanode tab. Can you please check datanode logs and see why the 2 DNs are not online?

Highlighted

Re: Amabri HDFS DataNodes Status always only 1 live

Explorer

Hi @rgangappa,

To make debug easier, I'm using only two DN for now. I've attached an ambari screenshot of the namenode. In the mean while, I checked DN logs as '/var/log/hadoop/hdfs/dn.log'

and they look normal as below. Any thought?

Thanks!

'2016-12-08 00:08:17,059 INFO block.BlockTokenSecretManager (BlockTokenSecretManager.java:addKeys(193)) - Setting block keys 2016-12-08 00:08:20,060 INFO datanode.DataNode (BPOfferService.java:processCommandFromActor(609)) - DatanodeCommand action : DNA_REGISTER from ip-10-155-5-133.ec2.internal/10.155.5.133:8020 with active state 2016-12-08 00:08:20,061 INFO datanode.DataNode (BPServiceActor.java:register(687)) - Block pool BP-898464046-10.155.5.133-1477525432492 (Datanode Uuid 6aa02b04-c4b2-475c-af9f-0d698071189e) service to ip-10-155-5-133.ec2.internal/10.155.5.133:8020 beginning handshake with NN 2016-12-08 00:08:20,062 INFO datanode.DataNode (BPServiceActor.java:register(706)) - Block pool Block pool BP-898464046-10.155.5.133-1477525432492 (Datanode Uuid 6aa02b04-c4b2-475c-af9f-0d698071189e) service to ip-10-155-5-133.ec2.internal/10.155.5.133:8020 successfully registered with NN'


screen-shot-2016-12-07-at-40907-pm.pngscreen-shot-2016-12-07-at-40924-pm.png
Highlighted

Re: Amabri HDFS DataNodes Status always only 1 live

Explorer

@rgangappa

A little bit more update as I saw some error in name node HDFS log with some error might relate to it.

‘2016-12-08 01:25:12,620 ERROR hdfs.StateChange (DatanodeManager.java:getDatanode(521)) - BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(10.155.5.194:50010, datanodeUuid=6aa02b04-c4b2-475c-af9f-0d698071189e, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-0b41d994-a84d-49df-926f-b73c2da5fa40;nsid=1518855958;c=0) is attempting to report storage ID 6aa02b04-c4b2-475c-af9f-0d698071189e. Node 10.155.4.167:50010 is expected to serve this storage. 2016-12-08 01:25:12,622 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(915)) - BLOCK* registerDatanode: from DatanodeRegistration(10.155.5.194:50010, datanodeUuid=6aa02b04-c4b2-475c-af9f-0d698071189e, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-0b41d994-a84d-49df-926f-b73c2da5fa40;nsid=1518855958;c=0) storage 6aa02b04-c4b2-475c-af9f-0d698071189e 2016-12-08 01:25:12,622 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(951)) - BLOCK* registerDatanode: 10.155.4.167:50010 is replaced by DatanodeRegistration(10.155.5.194:50010, datanodeUuid=6aa02b04-c4b2-475c-af9f-0d698071189e, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-0b41d994-a84d-49df-926f-b73c2da5fa40;nsid=1518855958;c=0) with the same storageID 6aa02b04-c4b2-475c-af9f-0d698071189e 2016-12-08 01:25:12,622 INFO net.NetworkTopology (NetworkTopology.java:remove(502)) - Removing a node: /default-rack/10.155.4.167:50010 2016-12-08 01:25:12,622 INFO net.NetworkTopology (NetworkTopology.java:add(426)) - Adding a new node: /default-rack/10.155.5.194:50010 2016-12-08 01:25:12,622 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(451)) - Number of failed storage changes from 0 to 0 2016-12-08 01:25:12,794 INFO BlockStateChange (UnderReplicatedBlocks.java:chooseUnderReplicatedBlocks(395)) - chooseUnderReplicatedBlocks selected 2 blocks at priority level 0; Total=2 Reset bookmarks? false 2016-12-08 01:25:12,794 INFO BlockStateChange (BlockManager.java:computeReplicationWorkForBlocks(1580)) - BLOCK* neededReplications = 208, pendingReplications = 0. 2016-12-08 01:25:12,794 INFO blockmanagement.BlockManager (BlockManager.java:computeReplicationWorkForBlocks(1587)) - Blocks chosen but could not be replicated = 2; of which 2 have no target, 0 have no source, 0 are UC, 0 are abandoned, 0 already have enough replicas.’.

Highlighted

Re: Amabri HDFS DataNodes Status always only 1 live

Rising Star

@Meng Meng

At the risk of sounding stupid, I have to ask this question. Can these nodes change its IP address somehow ? It looks like namenode thinks that this machine must have an IP address of 10.155.4.167 while the actual address is 10.155.5.194. My apologies that this error message so very cryptic. I had to look at source myself to understand what it meant.

What this error message is trying to say in english is that -- Namenode is under the impression that Data node UID : 6aa02b04-c4b2-475c-af9f-0d698071189e has IP address of 10.155.4.167, but right now it is connecting from 10.155.5.194. Another explanation might be that you have dual NICs in this machine which 2 different IP addresses bound.

Would it be possible to share the Namenode logs and datanode logs of this machine ?

Re: Amabri HDFS DataNodes Status always only 1 live

Just wanted to know if you were able to resolve this issue or not as I am facing the same issue.