Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

DATANODE + Failed to replace a bad datanode on the existing pipeline

avatar

hi all

we have ambari cluster with 4 datanode machine ( workers ) , and on each worker machine we have 1 disk of 1T size

before I explain the problem I want to clear that we verify the following and we not see any problem on the following subject

1 cluster is working without network problem

2 we check the DNS and resolving hostname is correctly

3 java heap size on HDFS increase to 8G ( so no problem with java heap size )

5. we checked the HDFS service check and no issue with that

6. we set the following:

To resolve this issue, we set the following two properties from Ambari > HDFS > Configs > Custom HDFS site > Add Property:

dfs.client.block.write.replace-datanode-on-failure.enable=NEVER

dfs.client.block.write.replace-datanode-on-failure.policy=NEVER

but we still have the problem



NOW - lets talk about the problem:

on one of the worker machine we see that

 tail -f /grid/sdb/hadoop/yarn/log/application_1523836627832749_4432/container_e23_1592736529519_4432_01_000041/stderr


---2018-07-12T20:51:28.028 ERROR [driver][][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[45.23.2.56:50010,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[45.23.2.56:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]], original=[DatanodeInfoWithStorage[45.23.2.56:50010,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[45.23.2.56:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1059)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1122)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1280)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1005)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:512)

we can saw the error about - java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available

what we can do else in order to resolved the failed "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available" ?

Michael-Bronson
13 REPLIES 13

avatar

@Jeoffrey unfortunately after setting the variables we still have the issue,

we restart the HDFS service and also the worker machine but we still see the - "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try"

80511-capture.png

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Do you still have this below parameter set to true? Remove it completely and retry and please paste the error

dfs.client.block.write.replace-datanode-on-failure.enable 

Regard

avatar

@Geoffrey yes I already removed it , and the error is exactly the same error as we already seen ,

Michael-Bronson

avatar

@Geoffrey , any suggestion how to ciontinue from this point ?

Michael-Bronson