Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Data Node maintenance

avatar
Expert Contributor

Data Center guys are taking down one of my data node for battery replacement. This maintenance is going to take 90 minutes. Through Ambari, i am going to put the node in the maintenance mode and bring down all the services.

Is this sufficient ?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Kumar Veerappan

Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.

It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).

+++++

If any of the responses helped, please vote and accept best answer.

View solution in original post

4 REPLIES 4

avatar

@Kumar Veerappan

If you taking only one data node down then you don't need to take down time, because its not going to harm data on that node falls under under replicated category framework will take care of replicating them back when node comes back, it'll get new data as an end user, there shouldn't be any issue.

If you take down time for whole cluster then server anything missing data error until data nodes re-registers back.

avatar
Super Guru

@Kumar Veerappan

Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.

It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).

+++++

If any of the responses helped, please vote and accept best answer.

avatar
Expert Contributor

@ Constantin Stanca

I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks:

  • Data Node - safely replicates the HDFS data to other DNs
  • Node Manager - stop accepting new job requests
  • Region Server - turns on drain mode

In a urgent situation, I could agree on your suggestion.

However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.

14421-datanode-maintenance.png

avatar

@Kumar Veerappan

please vote and accept best answer.