Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Data Node maintenance

Solved Go to solution

Data Node maintenance

Expert Contributor

Data Center guys are taking down one of my data node for battery replacement. This maintenance is going to take 90 minutes. Through Ambari, i am going to put the node in the maintenance mode and bring down all the services.

Is this sufficient ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Data Node maintenance

@Kumar Veerappan

Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.

It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).

+++++

If any of the responses helped, please vote and accept best answer.

4 REPLIES 4

Re: Data Node maintenance

@Kumar Veerappan

If you taking only one data node down then you don't need to take down time, because its not going to harm data on that node falls under under replicated category framework will take care of replicating them back when node comes back, it'll get new data as an end user, there shouldn't be any issue.

If you take down time for whole cluster then server anything missing data error until data nodes re-registers back.

Re: Data Node maintenance

@Kumar Veerappan

Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.

It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).

+++++

If any of the responses helped, please vote and accept best answer.

Highlighted

Re: Data Node maintenance

Contributor

@ Constantin Stanca

I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks:

  • Data Node - safely replicates the HDFS data to other DNs
  • Node Manager - stop accepting new job requests
  • Region Server - turns on drain mode

In a urgent situation, I could agree on your suggestion.

However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.

14421-datanode-maintenance.png

Re: Data Node maintenance

@Kumar Veerappan

please vote and accept best answer.