Created 07-12-2018 09:13 PM
hi all
we have ambari cluster with 4 datanode machine ( workers ) , and on each worker machine we have 1 disk of 1T size
before I explain the problem I want to clear that we verify the following and we not see any problem on the following subject
1 cluster is working without network problem
2 we check the DNS and resolving hostname is correctly
3 java heap size on HDFS increase to 8G ( so no problem with java heap size )
5. we checked the HDFS service check and no issue with that
6. we set the following:
To resolve this issue, we set the following two properties from Ambari > HDFS > Configs > Custom HDFS site > Add Property:
dfs.client.block.write.replace-datanode-on-failure.enable=NEVER
dfs.client.block.write.replace-datanode-on-failure.policy=NEVER
but we still have the problem
NOW - lets talk about the problem:
on one of the worker machine we see that
tail -f /grid/sdb/hadoop/yarn/log/application_1523836627832749_4432/container_e23_1592736529519_4432_01_000041/stderr ---2018-07-12T20:51:28.028 ERROR [driver][][] [org.apache.spark.scheduler.LiveListenerBus] Listener EventLoggingListener threw an exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[45.23.2.56:50010,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[45.23.2.56:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]], original=[DatanodeInfoWithStorage[45.23.2.56:50010,DS-f5c5260a-20b1-43f4-b8fd-53e88db2e48e,DISK], DatanodeInfoWithStorage[45.23.2.56:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1059) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1122) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1280) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1005) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:512)
we can saw the error about - java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available
what we can do else in order to resolved the failed "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available" ?
Created on 07-15-2018 12:06 PM - edited 08-18-2019 01:36 AM
@Jeoffrey unfortunately after setting the variables we still have the issue,
we restart the HDFS service and also the worker machine but we still see the - "Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try"
Created 07-15-2018 12:16 PM
Created 07-15-2018 12:20 PM
@Geoffrey yes I already removed it , and the error is exactly the same error as we already seen ,
Created 07-15-2018 03:34 PM
@Geoffrey , any suggestion how to ciontinue from this point ?