Created 02-15-2017 10:17 AM
Hi,
We have a 3 node HDP cluster with Ambari 2.4 . We run Tera Sort jobs for bench marking.
I would like to know how to hot swap a Data node hard disk (failed disk) without stopping the cluster services and without stopping on going Tera sort job ?
Thanks,
Steevan
Created 02-16-2017 10:18 AM
Unless the hardware supports hotswapping, you are going to have to shut the server down. If you do this quickly enough, HDFS won't overreact by trying to re-replicate data: it will give you 10-15 minutes to get the machine opened up, the new disk inserted and mounted and then the services restarted. It's good to format the disk in advance, to save that bit of the process.
Expect a lot of alert messages from the YARN side of things, which will notice in 60s, and from spark which will react even faster. Spark is likely to fail the job.
It is probably safest to turn off the YARN service for that node (and HBase if it is running there), so the scheduler doesn't get upset. Spark will get told of the decommissioning event and not treat failures there as a problem.
There's a more rigorous approach documented in replacing disks on a data node; it recommends a full HDFS node decommission. on a three node cluster that's likely to be problematic -there won't be anywhere to re-replicate the third block of every triple replicated block.
Created 02-16-2017 10:18 AM
Unless the hardware supports hotswapping, you are going to have to shut the server down. If you do this quickly enough, HDFS won't overreact by trying to re-replicate data: it will give you 10-15 minutes to get the machine opened up, the new disk inserted and mounted and then the services restarted. It's good to format the disk in advance, to save that bit of the process.
Expect a lot of alert messages from the YARN side of things, which will notice in 60s, and from spark which will react even faster. Spark is likely to fail the job.
It is probably safest to turn off the YARN service for that node (and HBase if it is running there), so the scheduler doesn't get upset. Spark will get told of the decommissioning event and not treat failures there as a problem.
There's a more rigorous approach documented in replacing disks on a data node; it recommends a full HDFS node decommission. on a three node cluster that's likely to be problematic -there won't be anywhere to re-replicate the third block of every triple replicated block.
Created 02-16-2017 11:19 AM
Thank you Stevel for the answer.
Yes, let us assume that the hardware and the OS(CentOS in my case ) supports hot swapping. You say it is difficult in a 3 node cluster. So if I have 5 to 6 nodes then I can hot swap the disk without disturbing currently running Spark job ?
Created 02-17-2017 01:45 PM
Its not so much that hotswap is difficult, but that with a 3 node cluster, a copy of every block is kept on every node. A cold swap, where HDFS notices things are missing, is the traumatic one, as it cannot re-replicate all the blocks and will be complaining about underreplication. If you can do a hot swap in OS & hardware, then you should stop the DN before doing that, and start it afterwards. It will examine its directories and report all the blocks it has to the namenode. If the cluster has underreplicated blocks, the DN will get told to copy them from the other two datanodes, which will take a time dependent on the number of blocks which were on the swapped disk (and which haven't already been considered missing and re-replicated onto other disks on the same datanode)
Maybe @Arpit Agarwal has some other/different advice. Arpit, presumably the new HDD will be unbalanced compared to the rest of the disks on the DN. What can be done about that in HDFS?
Created 02-18-2017 04:06 AM
Thank you Steve for those insights . These are very helpful for beginners like me.