Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Failed to find datanode, suggest to check cluster health. excludeDatanodes=null

avatar
Explorer

Hello guys,

I have setup a cluster and HDFS is up and running.

But MapReduce2 and Spark not getting started. It gives the below error.

{
  "RemoteException": {
    "exception": "IOException", 
    "javaClassName": "java.io.IOException", 
    "message": "Failed to find datanode, suggest to check cluster health. excludeDatanodes=null"
  }
}

Hbase is also getting the below error

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/hbase/data/.tmp/hbase.version could only be replicated to 0 nodes instead of minReplication (=1).There are 0 datanode(s) running and no node(s) are excluded in this operation.

Somehow the datanodes are not detected.

I updated the hosts file in all nodes with the private ip and private DNS. But still getting the same error.

Can someone please help me in this.

Thanks,

Nirmal J

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Nirmal J

I see that the rpc-address is set to Hostname But the actual "hostname -f" command output on NameNode shows "ip-10-0-223-116.ec2.internal" So ideally the following rpc-address should be actually using "ip-10-0-223-116.ec2.internal:8020" address instead of "ip-10-0-223-116:8020".

# grep -B 2 -A 2 'rpc-address' /Users/jsensharma/Downloads/41589-hdfs-site.xml 
  <property>
  	<name>dfs.namenode.rpc-address</name>
  	<value>ip-10-0-223-116:8020</value>
  </property>

.

Same with: "fs.defaultFS" should not be using "localhost", It seems to be causing the issue here and it should also be changed to the Hostname of NameNode

grep -B 2 -A 2 'localhost' /Users/jsensharma/Downloads/41588-core-site.xml 
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:8020</value>
      <final>true</final>
    </property>


Also strange that other addresses like following are set to "localhost"

# grep -B 2 -A 2 'localhost' /Users/jsensharma/Downloads/41589-hdfs-site.xml 
  <property>
  	<name>dfs.namenode.http-address</name>
  	<value>localhost:50070</value>
  	<final>true</final>
  	</property>

  <property>
  	<name>dfs.namenode.https-address</name>
  	<value>localhost:50470</value>
  </property>

  <property>
  	<name>dfs.namenode.secondary.http-address</name>
  	<value>localhost:50090</value>
  </property>





So can you please try this:


1. Edit your "/etc/hosts" file and add this line in addition to what you have in the "/etc/hosts" file of all the Hosts (including DataNode & NameNode) Please keep the "ip-10-0-223-116.ec2.internal" first after IP Address in this file

10.0.223.116     ip-10-0-223-116.ec2.internal    ip-10-0-223-116


2. Edit your "hdfs-site.xml" file and replace the "localhost" address that are mentioned above with "ip-10-0-223-116.ec2.internal" (if it does not work then try with "ip-10-0-223-116" on next try.)

3. Restart Your HDFS services after making these changes. I am not sure if you should be using "ip-10-0-223-116.ec2.internal" OR "ip-10-0-223-116" hostname. But based on the "hostname -f" output you should be using the "ip-10-0-223-116.ec2.internal", if it does not work then you should try both hostnames one by one to see which one works.

.

View solution in original post

23 REPLIES 23

avatar
Master Mentor

@Nirmal J

As your NameNode port 127.0.0.1:8020 indicates that it is listening on "127.0.0.1" IP hence remotely the port 8020 can not be contacted.

So you should check the "dfs.namenode.rpc-address" property of your HDFS to see if it is configured properly with "$HOSTNAME:8020" or not?

And if that hostname is resolving to the IPAddress of the NameNode host?

Can you please check and share the core-site.xml / hdfs-site.xml to see if th NameNode address is correct and it listening on Hostname (FQDN) and not on 127.0.0.1

.

This can happen normally in two cases.


1. If the /etc/hosts file entry on NameNode is not correct.


2. The other network interface addresses are not resolving.

avatar
Master Mentor

@Nirmal J

I see that the rpc-address is set to Hostname But the actual "hostname -f" command output on NameNode shows "ip-10-0-223-116.ec2.internal" So ideally the following rpc-address should be actually using "ip-10-0-223-116.ec2.internal:8020" address instead of "ip-10-0-223-116:8020".

# grep -B 2 -A 2 'rpc-address' /Users/jsensharma/Downloads/41589-hdfs-site.xml 
  <property>
  	<name>dfs.namenode.rpc-address</name>
  	<value>ip-10-0-223-116:8020</value>
  </property>

.

Same with: "fs.defaultFS" should not be using "localhost", It seems to be causing the issue here and it should also be changed to the Hostname of NameNode

grep -B 2 -A 2 'localhost' /Users/jsensharma/Downloads/41588-core-site.xml 
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:8020</value>
      <final>true</final>
    </property>


Also strange that other addresses like following are set to "localhost"

# grep -B 2 -A 2 'localhost' /Users/jsensharma/Downloads/41589-hdfs-site.xml 
  <property>
  	<name>dfs.namenode.http-address</name>
  	<value>localhost:50070</value>
  	<final>true</final>
  	</property>

  <property>
  	<name>dfs.namenode.https-address</name>
  	<value>localhost:50470</value>
  </property>

  <property>
  	<name>dfs.namenode.secondary.http-address</name>
  	<value>localhost:50090</value>
  </property>





So can you please try this:


1. Edit your "/etc/hosts" file and add this line in addition to what you have in the "/etc/hosts" file of all the Hosts (including DataNode & NameNode) Please keep the "ip-10-0-223-116.ec2.internal" first after IP Address in this file

10.0.223.116     ip-10-0-223-116.ec2.internal    ip-10-0-223-116


2. Edit your "hdfs-site.xml" file and replace the "localhost" address that are mentioned above with "ip-10-0-223-116.ec2.internal" (if it does not work then try with "ip-10-0-223-116" on next try.)

3. Restart Your HDFS services after making these changes. I am not sure if you should be using "ip-10-0-223-116.ec2.internal" OR "ip-10-0-223-116" hostname. But based on the "hostname -f" output you should be using the "ip-10-0-223-116.ec2.internal", if it does not work then you should try both hostnames one by one to see which one works.

.

avatar
Explorer

Thanks a lot @Jay SenSharma

This fixed the issue.

Thanks again for your time

avatar
Master Mentor

@Nirmal J

Good to now that the issue is resolved.

As the issue is resolved, hence it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button on the correct answer. That way other HCC users can quickly find the solution when they encounter the same issue.