Support Questions

Find answers, ask questions, and share your expertise

cant start DataNode from ambari cluster

avatar

when we start the data node on one of the workers machine we get:

ERROR datanode.DataNode (DataNode.java:secureMain(2691)) - Exception in secureMain org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 4, volumes configured: 5, volumes failed: 1, volume failures tolerated: 0

and this

WARN checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/ org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /xxxx/sdc/hadoop/hdfs/data

what are the steps that needs to fix it?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

WARN checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking 
StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/ org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /xxxx/sdc/hadoop/hdfs/data

The above error can occur sometimes whet the Hard Disk/Filesystem has gone bad and the filesystem is in Read-Only mode. Remounting might help. Please check for any hardware errors. Check the harddisk and remount the Volume.

Also it will be good to see "/etc/hadoop/conf/hdfs-site.xml" property "dfs.datanode.failed.volumes.tolerated" this will set the disk failure tolerance.

<property>
     <name>dfs.datanode.failed.volumes.tolerated</name>
     <value>1</value>
</property> 

.

View solution in original post

13 REPLIES 13

avatar

hi Aditya , on each worker machine we have 5 volumes , and we not want to stay with 4 volume on the problematic workers , so about option 2 we not want to remove the volume , second what is the meaning to set the dfs.datanode.failed.volumes.tolerated to 1 ? after HDFS restart - it will fix the problem ?

Michael-Bronson

avatar
Super Guru

@Michael Bronson,

If you set dfs.datanode.failed.volumes.tolerated to 'x', it will allow maximum of 'x' no of volumes to be failed. So HDFS restart should fix it.

avatar

another remark if I set this value to 1 it mean that HDFS will start up in spite the volume is bad ? or not in use ,

Michael-Bronson

avatar
Super Guru
@Michael Bronson

Yes. It will startup inspite the volume is bad. If you dont want this to happen you might have to replace your failed volume with a new volume (ie unmount old one and mount new one)