Support Questions

kums · ‎10-10-2016

Data Center guys are taking down one of my data node for battery replacement. This maintenance is going to take 90 minutes. Through Ambari, i am going to put the node in the maintenance mode and bring down all the services.

Is this sufficient ?

cstanca · ‎10-11-2016

@Kumar Veerappan

Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.

It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).

+++++

If any of the responses helped, please vote and accept best answer.

View solution in original post

ashneesharma88 · ‎10-10-2016

@Kumar Veerappan

If you taking only one data node down then you don't need to take down time, because its not going to harm data on that node falls under under replicated category framework will take care of replicating them back when node comes back, it'll get new data as an end user, there shouldn't be any issue.

If you take down time for whole cluster then server anything missing data error until data nodes re-registers back.

cstanca · ‎10-11-2016

@Kumar Veerappan

Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.

It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).

+++++

If any of the responses helped, please vote and accept best answer.

pd47 · ‎04-05-2017

@ Constantin Stanca

I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks:

Data Node - safely replicates the HDFS data to other DNs
Node Manager - stop accepting new job requests
Region Server - turns on drain mode

In a urgent situation, I could agree on your suggestion.

However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.

ashneesharma88 · ‎11-02-2016

@Kumar Veerappan

please vote and accept best answer.

Cloudera Community

Support Questions

Data Node maintenance