Support Questions

Find answers, ask questions, and share your expertise

Concern about Replication when Scheduled NameNode Maintenance

avatar
Expert Contributor

CentOS 6.6
CDH 5.1.2


I would like to take down a DataNode temporary (say, for 24 hours). Some questions:

 

  1. For normally replicated blocks (target replication factor=3), can I disable HDFS to automatically re-replicate those blocks? 
  2. For un-replicated blocks (replication factor=1), can I do anything to pre-relocate those blocks in case they are in the DataNode to be taken down?

Understand I risk data loss. But those were not critical data anyway.

 

Thanks.

1 ACCEPTED SOLUTION

avatar
Mentor
For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release.

For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.

View solution in original post

1 REPLY 1

avatar
Mentor
For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release.

For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.