Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Datanode low number of blocks

avatar
Explorer

Hi,

 

We have 3 datanodes but the other 1 is not having expected number of blocks compared to the other 2.

Noel_0317_0-1690875214328.png

 

Also, the count of under-replicated blocks is high.

Noel_0317_1-1690875256232.png

 

 

Datanode low block logs:

 

2023-08-01 07:31:30,134 DEBUG datanode.DataNode: isDatanode=false, isClient=true, isTransfer=false
2023-08-01 07:31:30,134 DEBUG datanode.DataNode: writeBlock receive buf size 530480 tcp no delay true
2023-08-01 07:31:30,134 INFO datanode.DataNode: Receiving BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972 src: /10.128.66.223:58748 dest: /10.128.64.127:9866
2023-08-01 07:31:30,134 DEBUG datanode.DataNode: BlockReceiver: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972
storageType=DISK, inAddr=/10.128.66.223:58748, myAddr=/10.128.64.127:9866
stage=PIPELINE_SETUP_CREATE, newGs=0, minBytesRcvd=0, maxBytesRcvd=0
clientname=DFSClient_NONMAPREDUCE_378619176_53, srcDataNode=:0, datanode=hdfs-datanode-0.hdfs-datanode.metering.svc.cluster.local:9866
requestedChecksum=DataChecksum(type=CRC32C, chunkSize=512)
cachingStrategy=CachingStrategy(dropBehind=null, readahead=null)
allowLazyPersist=false, pinning=false, isClient=true, isDatanode=false, responseInterval=30000, storageID=
2023-08-01 07:31:30,135 DEBUG datanode.DataNode: writeTo blockfile is /hadoop/dfs/data/current/BP-1419792351-10.128.20.54-1579223720671/current/rbw/blk_1122742296 of size 0
2023-08-01 07:31:30,135 DEBUG datanode.DataNode: writeTo metafile is /hadoop/dfs/data/current/BP-1419792351-10.128.20.54-1579223720671/current/rbw/blk_1122742296_49001972.meta of size 0
2023-08-01 07:31:30,136 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish w
rite.
2023-08-01 07:31:30,137 DEBUG datanode.DataNode: Receiving one packet for block BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972: PacketHeader with packetLen=31560 header data: offsetInBlock: 0
seqno: 0
lastPacketInBlock: false
dataLen: 31308

2023-08-01 07:31:30,138 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE: enqueue Packet(seqno=0, lastPacketInBlock=false, offsetInBlock=31308, ackEnqueueNanoTime=1058942549958316, ackStatus=SUCCESS)
2023-08-01 07:31:30,138 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE, replyAck=seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2023-08-01 07:31:30,138 DEBUG datanode.DataNode: PacketResponder: BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
2023-08-01 07:31:30,140 DEBUG datanode.DataNode: Receiving one packet for block BP-1419792351-10.128.20.54-1579223720671:blk_1122742296_49001972: PacketHeader with packetLen=4 header data: offsetInBlock: 31308
seqno: 1
lastPacketInBlock: true
dataLen: 0

 

 

But the other 2 datanodes are having the same format of logs:

 

2023-08-01 07:35:47,430 DEBUG datanode.VolumeScanner: start scanning block BP-1419792351-10.128.20.54-1579223720671:blk_1097852465_24111684
2023-08-01 07:35:47,430 DEBUG datanode.DataNode: block=BP-1419792351-10.128.20.54-1579223720671:blk_1097852465_24111684, replica=FinalizedReplica, blk_1097852465_24111684, FINALIZED
  getNumBytes()     = 26874
  getBytesOnDisk()  = 26874
  getVisibleLength()= 26874
  getVolume()       = /hadoop/dfs/data
  getBlockURI()     = file:/hadoop/dfs/data/current/BP-1419792351-10.128.20.54-1579223720671/current/finalized/subdir15/subdir6/blk_1097852465

 

 

 

 

bash-4.2$ hdfs fsck / | grep 'Under replicated' | grep /##/##/datasource_##/dt=2023-07-28/20230728_064016_04281_deje9_d6514f86-c17b-439c-8a7b-062141a490c9
/##/##/datasource_##/dt=2023-07-28/20230728_064016_04281_deje9_d6514f86-c17b-439c-8a7b-062141a490c9:
Under replicated BP-1419792351-10.128.20.54-1579223720671:blk_1122552021_48811695. Target Replicas is 3 but found 2 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).

 

 

Is there a way to fix the high number of under-replicated blocks?

 

 

1 REPLY 1

avatar
Expert Contributor

@Noel_0317 Do you have rack awareness configured for the Datanodes?

 

Also, check for any disk-level issues on the datanode.

 

Try enabling Debug for block placement :

log4j.logger.org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy=DEBUG
log4j.logger.org.apache.hadoop.hdfs.protocol.BlockStoragePolicy=DEBUG
Labels