Member since
09-02-2014
1
Post
4
Kudos Received
0
Solutions
10-10-2016
08:31 AM
4 Kudos
Providing some additional detail information for later reference. Manikumar's notes above only pertain to under replicated blocks, and not to missing blocks as the original problem statement. Missing blocks are ones where the Namenode determines that _all_ copies of the blocks are missing from the environment. While under replicated blocks are when the Namenode determines that some of the copies of the blocks are missing from the environment. As mentioned above, the under replicated blocks should be automatically recovered by HDFS. The Namenode coordinates the increase in replication for a block through the Datanodes. Under replicated blocks often occur with hardware failure, and it can take some amount of time to replicate all of the blocks to another disk, or Datanode. There are a couple of methods to monitor under replicated blocks. 1) For clusters with Cloudera Manager installed: Click on the "Charts" link at the top of the screen Click on "Chart Builder" use the following query: "select under_replicated_blocks;" This will display a plot over time of the under replicated blocks. If this value is decreasing, just continue to monitor the value until it drops to 0, and make sure that all Datanodes are healthy and available. 2) For clusters without Cloudera Manager The Namenode tracks the under replicated blocks through it's web ui in two ways: http://namenode.example.com:50070/dfshealth.html#tab-overview and look for "Under-Replicated" or http://namenode.example.com:50070/jmx and look for "UnderReplicatedBlocks" * The ports and locations will change for your cluster. Running a balancer, will not change replication of blocks. The Namenode will ask Datanodes to transfer blocks based upon the average disk utilization of the cluster compared to the average disk utilization of the node. The balancer is typically limited in throughput to enable balancing as a background task, while normal recovery of under replicated blocks happens at an unrestricted rate. If the under replicated blocks are not decreasing, but staying steady, then more investigation is necessary. Here are some questions to ask: Is this a small cluster? ( 3 nodes, under 10 ). If so: - Is the default replication greater than the number of alive Datanodes? - Is the value of mapreduce.client.submit.file.replication lower than the number of Datanodes configured? When a mapreduce job runs, it will attempt to ensure that files are copied to the cluster with mapreduce.client.submit.file.replication copies. If this is larger than the number of nodes that you have in the cluster, then you will always have under replicated blocks. Is the cluster larger? if so: - Is the network unhealthy? If the Datanodes are frequently out of touch with the cluster, then the Namenode may be marking blocks as wrongly under replicated. http://namenode.example.com:50070/dfshealth.html#tab-datanode will have information regarding last time that the Namenode was contacted by the Datanode. Work with your networking team to validate the environment, and make sure that any top of rack switches or any other networking hardware is healthy and not over subscribed. - Are there racks configured in the cluster? Is one rack entirely down? This will cause under replicated blocks that might be impossible to resolve. HDFS will not store all three block replicas within one rack. If you have only two racks, and one is down, then under replication will be impossible to resolve until the rack is healthy again. Is the problem limited to specific files? The default replication configured through Cloudera Manager, or through hdfs-site.xml in non-Cloudera Manager installations only determines the default. Individual users are able to change replication when any file is created. This is unusual, but may happen. The following command will show all files that are not open. Look for "Target Replicas is X but found Y replica(s)" hdfs fsck / -files If X is larger than the number of available nodes, or different than the default replication, then you will be able to change the replication of that file. hdfs dfs -setrep 3 /path/to/strangefile ( Also note that "hdfs dfs -ls -R /" will show desired replication for a file. Also "hdfs fsck / -blocks -files -locations" provides a very detailed view of all of the blocks of your cluster. Any of these commands may take a long time in a large cluster. )
... View more