<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: What are the steps an operator should take to replace disk in data node? Correction - NameNode in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96718#M10245</link>
    <description>&lt;P&gt;&lt;A href="http://community.hortonworks.com/users/461/vsomani.html"&gt;&lt;/A&gt;&lt;/P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/461/vsomani.html" nodeid="461"&gt;@vsomani@hortonworks.com&lt;/A&gt;&lt;P&gt;NameNode disk failure. There are couple of if's&lt;/P&gt;&lt;P&gt;1 - HA + RAID 10 &lt;/P&gt;&lt;P&gt; If HA is in place then failover to Passive (Assuming that active NN disk failed) + if RAID 10 is configured for NN then you are safe and have enough time to replace failed disk.&lt;/P&gt;&lt;P&gt;"When a single disk in a RAID 10 disk array fails, the disk array status changes to Degraded. The disk array remains functional because the data on the Failed disk is also stored on the other member of its mirrored pair.When ever a disk fails, replace it as soon as possible. If a hot spare disk is available, the controller can rebuild the data on the disk automatically. If a hot spare disk is not available, you will need to replace the failed disk and then initiate a rebuild. "&lt;/P&gt;&lt;P&gt;2  scenario - No HA, No RAID but NN backup in place + "dfs.namenode.name.dir" is writing to multiple disks.&lt;/P&gt;&lt;P&gt;You are safe as NN metadata writing to multiple disks so you can remove the disk location from Ambari and let operator recover the disk failure.&lt;/P&gt;&lt;P&gt;3 scenario - Bad design : No HA, No RAID, dfs.namenode.name.dir writing to single disk &lt;/P&gt;&lt;P&gt;Cluster is down. Backup everything that you can from NN. Let operator replace the disk. Restore the backup and then starts the troubleshooting process. &lt;/P&gt;&lt;P&gt;Good disucssion here &lt;A target="_blank" href="http://stackoverflow.com/questions/9712151/recover-hadoop-namenode-failure"&gt;1&lt;/A&gt; &lt;/P&gt;</description>
    <pubDate>Sun, 08 Nov 2015 20:57:35 GMT</pubDate>
    <dc:creator>nsabharwal</dc:creator>
    <dc:date>2015-11-08T20:57:35Z</dc:date>
    <item>
      <title>What are the steps an operator should take to replace disk in data node? Correction - NameNode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96716#M10243</link>
      <description>&lt;P&gt;Partner I am working with is looking for instructions to change a disk in datanote host &lt;/P&gt;&lt;P&gt;They could find the instructions for replacing disk on datanode here- &lt;A href="http://www.cloudera.com/content/www/en-us/documentation/manager/5-0-x/Cloudera-Manager-Managing-Clusters/cm5mc_dn_disk.html" target="_blank"&gt;http://www.cloudera.com/content/www/en-us/documentation/manager/5-0-x/Cloudera-Manager-Managing-Clusters/cm5mc_dn_disk.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;But could not find anything on the steps that an operator should take to replace a disk in namenode.&lt;/P&gt;&lt;P&gt;Looking for some steps or pointer to a doc that might have these steps?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Sat, 07 Nov 2015 13:41:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96716#M10243</guid>
      <dc:creator>VSomani</dc:creator>
      <dc:date>2015-11-07T13:41:34Z</dc:date>
    </item>
    <item>
      <title>Re: What are the steps an operator should take to replace disk in data node? Correction - NameNode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96717#M10244</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/461/vsomani.html" nodeid="461"&gt;@vsomani@hortonworks.com&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;Steps to replace disk in slavenodes or to perform maintenance of slavenode servers remains the same irrespective of Hadoop distribution. We don't have dedicated steps in our doc AFAIK. But below should be the steps.&lt;/P&gt;&lt;P&gt;1. Decommission the Datanode and all services running on it i.e. NodeManager, HBase RegionServer, Datanode etc. Below is reference for the same. &lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_Ambari_Users_Guide/content/_decommissioning_masters_and_slaves_.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_Ambari_Users_Guide/content/_decommissioning_masters_and_slaves_.html&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ch_slave_nodes.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Sys_Admin_Guides/content/ch_slave_nodes.html&lt;/A&gt; &lt;/P&gt;&lt;P&gt;2. Replace the disks or perform any other tasks for server maintenace.&lt;/P&gt;&lt;P&gt;3. Recommission the node.&lt;/P&gt;&lt;P&gt;4. Start all services components on the node.&lt;/P&gt;&lt;P&gt;5. Run Fsck for HDFS to ensure that HDFS is in healthy state. FSCK report might show a few over replicated blocks which would automatically be fixed.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Nov 2015 17:05:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96717#M10244</guid>
      <dc:creator>pardeep_kumar</dc:creator>
      <dc:date>2015-11-08T17:05:36Z</dc:date>
    </item>
    <item>
      <title>Re: What are the steps an operator should take to replace disk in data node? Correction - NameNode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96718#M10245</link>
      <description>&lt;P&gt;&lt;A href="http://community.hortonworks.com/users/461/vsomani.html"&gt;&lt;/A&gt;&lt;/P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/461/vsomani.html" nodeid="461"&gt;@vsomani@hortonworks.com&lt;/A&gt;&lt;P&gt;NameNode disk failure. There are couple of if's&lt;/P&gt;&lt;P&gt;1 - HA + RAID 10 &lt;/P&gt;&lt;P&gt; If HA is in place then failover to Passive (Assuming that active NN disk failed) + if RAID 10 is configured for NN then you are safe and have enough time to replace failed disk.&lt;/P&gt;&lt;P&gt;"When a single disk in a RAID 10 disk array fails, the disk array status changes to Degraded. The disk array remains functional because the data on the Failed disk is also stored on the other member of its mirrored pair.When ever a disk fails, replace it as soon as possible. If a hot spare disk is available, the controller can rebuild the data on the disk automatically. If a hot spare disk is not available, you will need to replace the failed disk and then initiate a rebuild. "&lt;/P&gt;&lt;P&gt;2  scenario - No HA, No RAID but NN backup in place + "dfs.namenode.name.dir" is writing to multiple disks.&lt;/P&gt;&lt;P&gt;You are safe as NN metadata writing to multiple disks so you can remove the disk location from Ambari and let operator recover the disk failure.&lt;/P&gt;&lt;P&gt;3 scenario - Bad design : No HA, No RAID, dfs.namenode.name.dir writing to single disk &lt;/P&gt;&lt;P&gt;Cluster is down. Backup everything that you can from NN. Let operator replace the disk. Restore the backup and then starts the troubleshooting process. &lt;/P&gt;&lt;P&gt;Good disucssion here &lt;A target="_blank" href="http://stackoverflow.com/questions/9712151/recover-hadoop-namenode-failure"&gt;1&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Sun, 08 Nov 2015 20:57:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96718#M10245</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-11-08T20:57:35Z</dc:date>
    </item>
    <item>
      <title>Re: What are the steps an operator should take to replace disk in data node? Correction - NameNode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96719#M10246</link>
      <description>&lt;P&gt;Thanks Neeraj.&lt;/P&gt;&lt;P&gt;In this case, the partner has HA, but no RAID. So they'll just need to failover to to the Passive NN&lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2015 01:58:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96719#M10246</guid>
      <dc:creator>VSomani</dc:creator>
      <dc:date>2015-11-10T01:58:00Z</dc:date>
    </item>
    <item>
      <title>Re: What are the steps an operator should take to replace disk in data node? Correction - NameNode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96720#M10247</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/461/vsomani.html" nodeid="461"&gt;@vsomani@hortonworks.com&lt;/A&gt; Sounds good. &lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2015 01:59:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96720#M10247</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-11-10T01:59:33Z</dc:date>
    </item>
    <item>
      <title>Re: What are the steps an operator should take to replace disk in data node? Correction - NameNode</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96721#M10248</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/140/nsabharwal.html" nodeid="140"&gt;@Neeraj&lt;/A&gt; Should We keep this answer or remove it. Looks like &lt;A rel="user" href="https://community.cloudera.com/users/461/vsomani.html" nodeid="461"&gt;@vsomani@hortonworks.com&lt;/A&gt; changed the question. I  have created an article out of it. &lt;A href="http://community.hortonworks.com/articles/3131/replacing-disk-on-datanode-hosts.html" target="_blank"&gt;http://community.hortonworks.com/articles/3131/replacing-disk-on-datanode-hosts.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Nov 2015 09:59:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/What-are-the-steps-an-operator-should-take-to-replace-disk/m-p/96721#M10248</guid>
      <dc:creator>pardeep_kumar</dc:creator>
      <dc:date>2015-11-10T09:59:53Z</dc:date>
    </item>
  </channel>
</rss>

