- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Data Node maintenance
- Labels:
-
Apache Hadoop
Created 10-10-2016 05:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Data Center guys are taking down one of my data node for battery replacement. This maintenance is going to take 90 minutes. Through Ambari, i am going to put the node in the maintenance mode and bring down all the services.
Is this sufficient ?
Created 10-11-2016 01:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.
It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).
+++++
If any of the responses helped, please vote and accept best answer.
Created 10-10-2016 08:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kumar Veerappan
If you taking only one data node down then you don't need to take down time, because its not going to harm data on that node falls under under replicated category framework will take care of replicating them back when node comes back, it'll get new data as an end user, there shouldn't be any issue.
If you take down time for whole cluster then server anything missing data error until data nodes re-registers back.
Created 10-11-2016 01:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming HDFS replication factor > 1 (default is 3), put the node under maintenance and stop services running on the node. Once the server comes back up, start services and take the node out of maintenance, in that order. Putting the node under maintenance before stopping the services will eliminate the risk of alerts. Starting the services before taking the node out of maintenance will prevent the alerts as well.
It is unlikely that your data node will remain that much behind but you may consider HDFS rebalancing to your threshold (default is 10%).
+++++
If any of the responses helped, please vote and accept best answer.
Created on 04-05-2017 03:12 PM - edited 08-19-2019 03:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought the proper way to do the maintenance on the data node is to decommission it, so it can do the following tasks:
- Data Node - safely replicates the HDFS data to other DNs
- Node Manager - stop accepting new job requests
- Region Server - turns on drain mode
In a urgent situation, I could agree on your suggestion.
However, please advise me the right approach in a scenario where you have luxury to choose the maintenance window.
Created 11-02-2016 08:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please vote and accept best answer.
