Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Missing blocks, IPC max length increased, DNs not connecting to NN

avatar
New Contributor

hi, can someone suggest what could be the problem. i have big volume of missing blocks and namenode log error indicated that ipc max length is not enough, increased it, but that didn't help much, DNs before and after this increase still showing same that replica cache file doesn't exist, i don't see DNs are actually trying to connect to NN

Setup is very small, 2 servers, 1 server used for NN and DN, second server is DN. Cloudera Express 6.3.1 Express used.

 

Configured Capacity: 3787349868544 (3.44 TB)
Present Capacity: 3593183903744 (3.27 TB)
DFS Remaining: 3235560472576 (2.94 TB)
DFS Used: 357623431168 (333.06 GB)
DFS Used%: 9.95%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 7899994
Missing blocks (with replication factor 1): 405
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0

 

Both DN1 & DN2

2024-03-24 10:16:51,876 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1052250670-10.53.XXX-XX-XXXX981591679 on volume /opt/hadoop/dfs/dn...
2024-03-24 10:16:51,876 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice: Replica Cache file: /opt/hadoop/dfs/dn/current/BP-1052250670-10.53.XXX-XX-XXXX981591679/current/replicas doesn't exist

NameNode
2024-03-24 10:56:59,872 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 28 Total time for transactions(ms): 15 Number of transactions batched in Syncs: 2 Number of syncs: 26 SyncTimes(ms): 64
2024-03-24 10:56:59,889 INFO org.apache.hadoop.ipc.Server: IPC Server handler 13 on 8020, call Call#23851 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.53.235.25:39568
java.io.IOException: File /opt/logs/hdfs_canary_health/.canary_file_2024_03_24-10_56_59.f684be6f08559e9f could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2102)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2673)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:872)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:550)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

2 REPLIES 2

avatar
Community Manager

@user2024, Welcome to our community! To help you get the best possible answer, I have tagged in our Cloudera Manager experts @upadhyayk04 @utrivedi @Rajat_710 @Raamar  who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

Hi, @user2024 I don't the canary file is gonna cause this issue, the blocks that are corrupt/missing are now lost and cannot be recovered, you can manually delete those blocks by identifying them using the below command and run the hdfs balancer on HDFS so that NN will balance the new blocks across the cluster.

hdfs fsck -list-corruptfileblocks

You can also refer to the below article.

https://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files