I am facing issues with HDFS in my Cloudera Manager cluster. I have a cluster of 4 virtual machines (1 master and 3 slaves) on cloud and at every shutdown and restart of the cluster/vms HDFS shows some missing blocks.
I have found a "workaround" as deleting the missing files and restarting the service would solve the problem but sometimes the missing files contain important data related to other hadoop services that generate issues if they are removed. Is there another solution to avoid deleting data ?
1. Data corruption - Either the disk is corrupted or the VM is down. This is a permanently failure means the data is lost.
2. Delay is Datanode report - This is because of a delay in datanode report and this is a temporary failure and in some time if datanode reports back the report then the cluster will be back to normal.
How to find and solve the issue.
1. Make sure no data volume failures or VM failures.
2. Ensure that the namenode received heartbeats from all datanodes. Namenode UI -> Datanodes -> last contact If all datanodes found in the list and not in dead list or not found then no issues with receiving block report. Apart from this to debug further we need Namenode and datanode logs.
Thank you for your response , so do you think that maybe because i shut down the virtual machines everyday the data is sometimes lost ? it is very unusual since the issue now is not happening every time but sometimes hdfs is corrupt and shows blocks missing and sometimes it's healthy from the start so i don't know what to think of it.
I suspect that your datanodes report is slow.Because after restart of namenode you are trigger the datanode restart so it will take time to come up with reports during that interval you can except for missing blocks this will be an intermediate issue. So that you can wait for few more min's and check the namenode ui. Else during the time of issue copy the logs and share it.
Make sure to mark the answer as the accepted solution. If it resolves your issue !
When I restarted the cluster earlier,there are also many missingBlocks before the DataNode restarts completely.
This can cause the missing blocks.ex: Namenode got restarted but still the datanode restart is inprogress. So the heartbeat from datanode might be missed. To confirm that you can check the namenode UI post restart and during the time of missing blocks.