I just upgraded our cluster from CDH 5.0.1 to 5.2.1, using parcels and following the provided instructions.
After the upgrade has finished, the health test "Data Directory Status" is critical for one of the data nodes. The reported error message is "The DataNode has 1 volume failure(s)". By running 'hdfs dfsadmin -report' I can also confirm that the available HDFS space on that node is approximately 4 TB less than on the other nodes, indicating that one of the disks is not being used.
However, when checking the status of the actual disks and regular file system we can not find anything that seems wrong. All disks are mounted and seem to be working as they should. There is also an in_use.lock file in the dfs/nn directory on all of the disks.
How can I get more detailed information about which volume the DataNode is complaining about, and what the issue might be?
The source of this error has been found. It turned out that /etc/fstab on this node was badly configured, so that one of the disks was mounted twice as two separate data directories. Interestingly, this has not been causing any visible errors until upgrading to CDH 5.2.1. Nice that it was pointed out to us by this version though.
In my case I did moved the existing dn directory as it has very less data and started the services.
There was no disk failure or issue with fstab file or mounting.
Yes even in my case fstab issue was not there. Rather i could clearly find the issue from datanode logs under /var/log/ and fix the issue. I have a blog on the same as below. Please comment on blog if it helps.