Created 12-01-2017 05:28 PM
Hi All,
My Hadoop cluster is running on CentOS 6.7 and consist of 3 nodes. It is up and working since more than a year now.
However recently I am noticing my Namenode throwing warning "Failed to find datanode (scope="" excludedScope="/default")". The Datanode in question, is the role running on the same server as NameNode. I am not sure why is this happening. How can I fix this?
Also, I am facing blocks under-replicated issue. If I try to set the replication manually or i just wait it tends to decrease however after few hours it increase again.
Please advice what to do.
Thanks,
Shilpa
Created 12-02-2017 10:08 PM
1 . can you ping from your datanode host to namenode host ?
2. did you try runining your balancer since it is a small cluster i believe you can afford it .
3. what happens after setting the replication manually . does the warning go away
Created 12-04-2017 11:27 AM
Hi @csguna
Please find below the reply:
1. can you ping from your datanode host to namenode host ? Yes, I can. It is the same host as of NameNode.
[hdfs@<HOSTNAME> ~]$ ping localhost PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.129 ms 64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.130 ms ^C --- localhost ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1711ms rtt min/avg/max/mdev = 0.129/0.129/0.130/0.011 ms [hdfs@<HOSTNAME> ~]$ ping <HOSTNAME> PING <FQDN> (10.0.0.4) 56(84) bytes of data. 64 bytes from <FQDN> (10.0.0.4): icmp_seq=1 ttl=64 time=0.121 ms 64 bytes from <FQDN> (10.0.0.4): icmp_seq=2 ttl=64 time=0.109 ms ^C --- <FQDN> ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1774ms rtt min/avg/max/mdev = 0.109/0.115/0.121/0.006 ms [hdfs@<HOSTNAME> ~]$ ping <FQDN> PING <FQDN> (10.0.0.4) 56(84) bytes of data. 64 bytes from <FQDN> (10.0.0.4): icmp_seq=1 ttl=64 time=0.120 ms 64 bytes from <FQDN> (10.0.0.4): icmp_seq=2 ttl=64 time=0.130 ms ^C --- <FQDN> ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1706ms rtt min/avg/max/mdev = 0.120/0.125/0.130/0.005 ms [hdfs@<HOSTNAME> ~]$ PS: the only thing i did in my cluster is to closed Public IP for security. I did this right after I brought up my cluster last year.
2. did you try runining your balancer since it is a small cluster i believe you can afford it . No, I didnt
3. what happens after setting the replication manually . does the warning go away? It starts decreasing but after couple of hours it starts growing back.
Created 12-05-2017 08:14 AM
1. thats a bad deployment , dedicated a sperate host for namenode , decommission the datanode from that host.
2. try runining your balancer . let me know if that fix.