For example, There are 500 nodes in the Hadoop cluster, Linux team want to make OS patches/upgrades in batches, how can we make(Hadoop Admins) sure data availability and no impacts on jobs without decommissioning? Because decommissioning of heavy large nodes(say 90TB) nodes take forever, is there any way we can do this without decommissioning the nodes?
Say, we have rack awareness set, each rack has 6 nodes.
This should answer your worries.
Be cautious always with production cluster you need to test and document in DEV, UAT or pre-PROD never said you were not warned 🙂
Happy Hadooping !!