Created 11-10-2017 10:54 AM
when we start the data node on one of the workers machine we get:
ERROR datanode.DataNode (DataNode.java:secureMain(2691)) - Exception in secureMain org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 4, volumes configured: 5, volumes failed: 1, volume failures tolerated: 0
and this
WARN checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/ org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /xxxx/sdc/hadoop/hdfs/data
what are the steps that needs to fix it?
Created 11-10-2017 12:31 PM
WARN checker.StorageLocationChecker (StorageLocationChecker.java:check(208)) - Exception checking StorageLocation [DISK]file:/grid/sdc/hadoop/hdfs/data/ org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not writable: /xxxx/sdc/hadoop/hdfs/data
The above error can occur sometimes whet the Hard Disk/Filesystem has gone bad and the filesystem is in Read-Only mode. Remounting might help. Please check for any hardware errors. Check the harddisk and remount the Volume.
Also it will be good to see "/etc/hadoop/conf/hdfs-site.xml" property "dfs.datanode.failed.volumes.tolerated" this will set the disk failure tolerance.
<property> <name>dfs.datanode.failed.volumes.tolerated</name> <value>1</value> </property>
.
Created 11-10-2017 12:50 PM
hi Aditya , on each worker machine we have 5 volumes , and we not want to stay with 4 volume on the problematic workers , so about option 2 we not want to remove the volume , second what is the meaning to set the dfs.datanode.failed.volumes.tolerated to 1 ? after HDFS restart - it will fix the problem ?
Created 11-10-2017 12:56 PM
If you set dfs.datanode.failed.volumes.tolerated to 'x', it will allow maximum of 'x' no of volumes to be failed. So HDFS restart should fix it.
Created 11-10-2017 01:14 PM
another remark if I set this value to 1 it mean that HDFS will start up in spite the volume is bad ? or not in use ,
Created 11-10-2017 01:22 PM
Yes. It will startup inspite the volume is bad. If you dont want this to happen you might have to replace your failed volume with a new volume (ie unmount old one and mount new one)