Member since 
    
	
		
		
		12-02-2014
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                8
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3269 | 04-16-2021 06:19 PM | 
			
    
	
		
		
		07-16-2024
	
		
		03:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @GangWar  @wert_1311  I have found HDFS files that are persistently under-replicated, despite being over a year old. They are rare, but vulnerable to loss with one disk failure.   To be clear, this shows the replication target, not the actual:  hdfs dfs -ls filename    The actual can be found with 'hdfs fsck filename -blocks -files filename'      In theory, this situation should be transient, but I have found some cases. See example below where a file is 3 blocks in length and one of them only has one replica.   # hdfs fsck -blocks -files /tmp/part-m-03752 OUTPUT:  /tmp/part-m-03752: Under replicated BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792. Target Replicas is 3 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).  /tmp/part-m-03752: Replica placement policy is violated for BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792. Block should be additionally replicated on 1 more rack(s).  0. BP-955733439-1.2.3.4-1395362440665:blk_1967769089_1100461809406 len=134217728 Live_repl=3  1. BP-955733439-1.2.3.4-1395362440665:blk_1967769276_1100461809593 len=134217728 Live_repl=3  2. BP-955733439-1.2.3.4-1395362440665:blk_1967769468_1100461809792 len=40324081 Live_repl=1  Status: HEALTHY  Total size: 308759537 B  Total dirs: 0  Total files: 1  Total symlinks: 0  Total blocks (validated): 3 (avg. block size 102919845 B)  Minimally replicated blocks: 3 (100.0 %)  Over-replicated blocks: 0 (0.0 %)  Under-replicated blocks: 1 (33.333332 %)  Mis-replicated blocks: 1 (33.333332 %)  Default replication factor: 3  Average block replication: 2.3333333  Corrupt blocks: 0  Missing replicas: 2 (22.222221 %)  Number of data-nodes: 30  Number of racks: 3  The filesystem under path '/tmp/part-m-03752' is HEALTHY  # hadoop fs -ls /tmp/part-m-03752 OUTPUT:  -rw-r--r-- 3 wuser hadoop 308759537 2021-12-11 16:58 /tmp/part-m-03752  [sorry, code quoting is not working for me for some reason.]  Presumably, the file was incorrectly replicated when it was written because of some failure and the defaults for dfs.client.block.write.replace-datanode-on-failure props were such that new DNs were not obtained at write time to replace ones that failed. The puzzling thing here is why does it not get re-replicated after all this time?     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-16-2021
	
		
		06:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Update: I moved SM to a host that has an typical load of 7-8 instead of 24. After a day on the new machine, there have been no alerts generated about SM being slow and no gaps in charts.      Conclusion: The problem was SM works best on a machine with low load.    
						
					
					... View more