question DATANODE + Failed to replace a bad datanode on the existing pipeline in Support Questions

question DATANODE + Failed to replace a bad datanode on the existing pipeline in Support Questions https://community.cloudera.com/t5/Support-Questions/DATANODE-Failed-to-replace-a-bad-datanode-on-the-existing/m-p/184142#M146289 hi allwe have ambari cluster with 4 datanode machine ( workers ) , and on each worker machine we have 1 disk of 1T sizebefore I explain the problem I want to clear that we verify the following and we not see any problem on the following subject1 cluster is working without network problem2 we check the DNS and resolving hostname is correctly3 java heap size on HDFS increase to 8G ( so no problem with java heap size )5. we checked the HDFS service check and no issue with that6. we set the following:To resolve this issue, we set the following two properties from Ambari > HDFS > Configs > Custom HDFS site > Add Property:dfs.client.block.write.replace-datanode-on-failure.enable=NEVERdfs.client.block.write.replace-datanode-on-failure.policy=NEVERbut we still have the problem NOW - lets talk about the problem:on one of the worker machine we see that<PRE> tail -f /grid/sdb/hadoop/yarn/log/application_1523836627832749_4432/container_e23_1592736529519_4432_01_000041/stderr ---2018-07-12T20:51:28.028 ERROR [driver][][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[45.23.2.56:50010,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[45.23.2.56:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]], original=[DatanodeInfoWithStorage[45.23.2.56:50010,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[45.23.2.56:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1059) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1122) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1280) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1005) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:512) </PRE> we can saw the error about - java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available what we can do else in order to resolved the failed "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available" ? Fri, 13 Jul 2018 04:13:36 GMT mike_bronson7 2018-07-13T04:13:36Z