Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem bringing a data node back online after a crash

Problem bringing a data node back online after a crash

Explorer

We are testing various failure modes to ensure no essential data loss. We have a 3 node cluster with a replication factor of 2. To simulate a crash, we power off node C and leave it off for at least 10  minutes. Then we restart Node C. After coming back up, the files that had blocks on Node C show under-replication. I then issue the command 

 

sudo -u hdfs hadoop fs -setrep -w 2 file

 

to try and fix the under replicated blocks. However, the command never returns. What I see in the data node log on Node C is that Hadoop is trying to write this file to Node C to achieve a replication or 2, but this file already exists on Node C. I get the following error:

 

2020-05-04 13:57:58,169 INFO datanode.DataNode (DataXceiver.java:run(305)) - vault-svr3.vicads230.local:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.1.31.232:39928 dst: /10.1.31.233:50010; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-165901586-10.1.31.231-1588389285182:blk_1073741928_7575 already exists in state FINALIZED and thus cannot be created.

 

My question is, what is the best way to bring up a data node after a crash, assuming that the crash did not corrupt data on the disks?

 

Thanks, David

 

Don't have an account?
Coming from Hortonworks? Activate your account here