Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem bringing a data node back online after a crash

Problem bringing a data node back online after a crash


We are testing various failure modes to ensure no essential data loss. We have a 3 node cluster with a replication factor of 2. To simulate a crash, we power off node C and leave it off for at least 10  minutes. Then we restart Node C. After coming back up, the files that had blocks on Node C show under-replication. I then issue the command 


sudo -u hdfs hadoop fs -setrep -w 2 file


to try and fix the under replicated blocks. However, the command never returns. What I see in the data node log on Node C is that Hadoop is trying to write this file to Node C to achieve a replication or 2, but this file already exists on Node C. I get the following error:


2020-05-04 13:57:58,169 INFO datanode.DataNode ( - vault-svr3.vicads230.local:50010:DataXceiver error processing WRITE_BLOCK operation src: / dst: /; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-165901586- already exists in state FINALIZED and thus cannot be created.


My question is, what is the best way to bring up a data node after a crash, assuming that the crash did not corrupt data on the disks?


Thanks, David


Don't have an account?
Coming from Hortonworks? Activate your account here