Support Questions

Find answers, ask questions, and share your expertise

DATANODE + Failed to replace a bad datanode on the existing pipeline


hi all

we have ambari cluster with 4 datanode machine ( workers ) , and on each worker machine we have 1 disk of 1T size

before I explain the problem I want to clear that we verify the following and we not see any problem on the following subject

1 cluster is working without network problem

2 we check the DNS and resolving hostname is correctly

3 java heap size on HDFS increase to 8G ( so no problem with java heap size )

5. we checked the HDFS service check and no issue with that

6. we set the following:

To resolve this issue, we set the following two properties from Ambari > HDFS > Configs > Custom HDFS site > Add Property:



but we still have the problem

NOW - lets talk about the problem:

on one of the worker machine we see that

 tail -f /grid/sdb/hadoop/yarn/log/application_1523836627832749_4432/container_e23_1592736529519_4432_01_000041/stderr

---2018-07-12T20:51:28.028 ERROR [driver][][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]], original=[DatanodeInfoWithStorage[,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(
        at org.apache.hadoop.hdfs.DFSOutputStream$

we can saw the error about - Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available

what we can do else in order to resolved the failed "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available" ?



@Jeoffrey unfortunately after setting the variables we still have the issue,

we restart the HDFS service and also the worker machine but we still see the - "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try"



Master Mentor

@Michael Bronson

Do you still have this below parameter set to true? Remove it completely and retry and please paste the error




@Geoffrey yes I already removed it , and the error is exactly the same error as we already seen ,



@Geoffrey , any suggestion how to ciontinue from this point ?
