Support Questions

Find answers, ask questions, and share your expertise

Unable to start datanode - Too many failed volumes.

avatar
Contributor

Hi,

After reinstalling HDP2.3, I am getting the following error when I try to restart the service.

org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 3, volumes configured: 9, volumes failed: 6, volume failures tolerated: 0 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:289) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1412) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:821) at java.lang.Thread.run(Thread.java:745)

When I digged the data dir, some of them contains directory from prior installation. How to fix this issue.

Thanks in advance.

Regards,

Subramanian S.

1 ACCEPTED SOLUTION

avatar

Subramanian Santhanam

Chek you hdfs-site.xml for dfs.data.dir.

This is a comma-delimited list of directories. Remove what you do not need.

If this is ambari managed then change this from ambari.

HDFS -> Config -> DataNode directories

Ensure that it is configured correctly.

View solution in original post

4 REPLIES 4

avatar

Subramanian Santhanam

Chek you hdfs-site.xml for dfs.data.dir.

This is a comma-delimited list of directories. Remove what you do not need.

If this is ambari managed then change this from ambari.

HDFS -> Config -> DataNode directories

Ensure that it is configured correctly.

avatar
Contributor

Hi,

The issue is fixed. When I am running the cleanup script I removed the users, so the folder permission become zombie.

I fixed them using chown. Now its working fine. Thanks @Rahul Pathak and @Kuldeep Kulkarni.

Regards,

Subramanian S.

avatar
Master Guru
@Subramanian Santhanam

Please check why you have 6 disks failures. For a workaround you can do what Rahul has suggested in his answer or you can increase value of below property to allow datanode to tolerate X number of disks failures.

dfs.datanode.failed.volumes.tolerated - By default this is set to 0 in hdfs-site.xml

avatar
New Contributor


<configuration>


<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop-3.3.4/data/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop-3.3.4/data/datanode</value>
</property>

</configuration>

 

i have this , buy yet problems