Created 08-01-2023 01:56 AM
Hi,
We have 3 datanodes but the other 1 is not having expected number of blocks compared to the other 2.
Also, the count of under-replicated blocks is high.
Datanode low block logs:
2023-08-01 07:31:30,134 DEBUG datanode.DataNode: isDatanode=false, isClient=true, isTransfer=false
2023-08-01 07:31:30,134 DEBUG datanode.DataNode: writeBlock receive buf size 530480 tcp no delay true
2023-08-01 07:31:30,134 INFO datanode.DataNode: Receiving BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972 src: /10.128.66.223:58748 dest: /10.128.64.127:9866
2023-08-01 07:31:30,134 DEBUG datanode.DataNode: BlockReceiver: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972
storageType=DISK, inAddr=/10.128.66.223:58748, myAddr=/10.128.64.127:9866
stage=PIPELINE_SETUP_CREATE, newGs=0, minBytesRcvd=0, maxBytesRcvd=0
clientname=DFSClient_NONMAPREDUCE_378619176_53, srcDataNode=:0, datanode=hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local:9866
requestedChecksum=DataChecksum(type=CRC32C, chunkSize=512)
cachingStrategy=CachingStrategy(dropBehind=null, readahead=null)
allowLazyPersist=false, pinning=false, isClient=true, isDatanode=false, responseInterval=30000, storageID=
2023-08-01 07:31:30,135 DEBUG datanode.DataNode: writeTo blockfile is /hadoop/dfs/data/current/BP-1419792351-10.128.20.54-1579223720671/current/rbw/blk_1122742296 of size 0
2023-08-01 07:31:30,135 DEBUG datanode.DataNode: writeTo metafile is /hadoop/dfs/data/current/BP-1419792351-10.128.20.54-1579223720671/current/rbw/blk_1122742296_49001972.meta of size 0
2023-08-01 07:31:30,136 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish w
rite.
2023-08-01 07:31:30,137 DEBUG datanode.DataNode: Receiving one packet for block BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972: PacketHeader with packetLen=31560 header data: offsetInBlock: 0
seqno: 0
lastPacketInBlock: false
dataLen: 31308
2023-08-01 07:31:30,138 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE: enqueue Packet(seqno=0, lastPacketInBlock=false, offsetInBlock=31308, ackEnqueueNanoTime=1058942549958316, ackStatus=SUCCESS)
2023-08-01 07:31:30,138 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE, replyAck=seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2023-08-01 07:31:30,138 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
2023-08-01 07:31:30,140 DEBUG datanode.DataNode: Receiving one packet for block BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972: PacketHeader with packetLen=4 header data: offsetInBlock: 31308
seqno: 1
lastPacketInBlock: true
dataLen: 0
But the other 2 datanodes are having the same format of logs:
2023-08-01 07:35:47,430 DEBUG datanode.VolumeScanner: start scanning block BP-1419792351-10.128.20.54-1579223720671:blk_1097852465_24111684
2023-08-01 07:35:47,430 DEBUG datanode.DataNode: block=BP-1419792351-10.128.20.54-1579223720671:blk_1097852465_24111684, replica=FinalizedReplica, blk_1097852465_24111684, FINALIZED
getNumBytes() = 26874
getBytesOnDisk() = 26874
getVisibleLength()= 26874
getVolume() = /hadoop/dfs/data
getBlockURI() = file:/hadoop/dfs/data/current/BP-1419792351-10.128.20.54-1579223720671/current/finalized/subdir15/subdir6/blk_1097852465
bash-4.2$ hdfs fsck / | grep 'Under replicated' | grep /##/##/datasource_##/dt=2023-07-28/20230728_064016_04281_deje9_d6514f86-c17b-439c-8a7b-062141a490c9
/##/##/datasource_##/dt=2023-07-28/20230728_064016_04281_deje9_d6514f86-c17b-439c-8a7b-062141a490c9:
Under replicated BP-1419792351-10.128.20.54-1579223720671:blk_1122552021_48811695. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
Is there a way to fix the high number of under-replicated blocks?
Created 08-04-2023 12:36 AM
@Noel_0317 Do you have rack awareness configured for the Datanodes?
Also, check for any disk-level issues on the datanode.
Try enabling Debug for block placement :
log4j.logger.org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy=DEBUG log4j.logger.org.apache.hadoop.hdfs.protocol.BlockStoragePolicy=DEBUG
Created 08-04-2023 12:36 AM
@Noel_0317 Do you have rack awareness configured for the Datanodes?
Also, check for any disk-level issues on the datanode.
Try enabling Debug for block placement :
log4j.logger.org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy=DEBUG log4j.logger.org.apache.hadoop.hdfs.protocol.BlockStoragePolicy=DEBUG