- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
CDH3: disk failure, datanode doesn't start even after disk replacement
- Labels:
-
Apache Hadoop
-
HDFS
Created on ‎04-15-2014 09:32 AM - edited ‎09-16-2022 01:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in our CDH3 cluster (hadoop-0.20.2, yes, it's pretty old 😉 ) we had a disk failure on one node and thereby the datanode went down.
After replacing the disk and setting up directories/permissions, starting the datanode still fails with this error:
2014-04-15 16:14:43,165 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 5, volumes configured: 6, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:1025)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:416)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:303)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1643)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1583)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1601)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1727)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1744)
I don't want to configure a tolerated disk failure of 1 to be able to start the datanode 😉
br, Gerd
Created ‎04-15-2014 12:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
issue has been solved. Problem was that there was a mismatch between directory permissions and ownership (owner was 700, not the permissions, stupid thing 😉 ).
Nevertheless the error message is somehow misleading and it would preferrably print that the user/permissions are incorrect.
Gerd
Created ‎04-15-2014 12:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
issue has been solved. Problem was that there was a mismatch between directory permissions and ownership (owner was 700, not the permissions, stupid thing 😉 ).
Nevertheless the error message is somehow misleading and it would preferrably print that the user/permissions are incorrect.
Gerd
