Support Questions

suraj_bose · ‎04-28-2016

Our cluster is running on hdp 2.3.4.0 and one of the host is showing disk (/dev/sda1) usage 99% (see the attachment) where as there is enough disk space available in /dev/sdb1. By default ambari selected (I don't know how) /dev/sda1 during hadoop cluster setup. Can I somehow change the disk from /dev/sda1 to /dev/sdb1 without disturbing/loosing any data from the cluster? If not what is the best alternative. Please suggests.

ravi1 · ‎04-28-2016

From hadoop FAQ on apache,

3.12. On an individual data node, how do you balance the blocks on the disk?

Hadoop currently does not have a method by which to do this automatically. To do this manually:

Shutdown the DataNode involved
Use the UNIX mv command to move the individual block replica and meta pairs from one directory to another on the selected host. On releases which have HDFS-6482 (Apache Hadoop 2.6.0+) you also need to ensure the subdir-named directory structure remains exactly the same when moving the blocks across the disks. For example, if the block replica and its meta pair were under /data/1/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/, and you wanted to move it to /data/5/ disk, then it MUST be moved into the same subdirectory structure underneath that, i.e. /data/5/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/. If this is not maintained, the DN will no longer be able to locate the replicas after the move.
Restart the DataNode.

However, this is not something that I recommend. A cleaner approach that you can take is decommission node, change the mount point and add it back to the cluster. I say cleaner because directly touching data directory can corrupt your data with a single misstep.

View solution in original post

suraj_bose · ‎04-28-2016

screenshots.jpg screenshots2.jpg

ravi1 · ‎04-28-2016

From hadoop FAQ on apache,

3.12. On an individual data node, how do you balance the blocks on the disk?

Hadoop currently does not have a method by which to do this automatically. To do this manually:

Shutdown the DataNode involved
Use the UNIX mv command to move the individual block replica and meta pairs from one directory to another on the selected host. On releases which have HDFS-6482 (Apache Hadoop 2.6.0+) you also need to ensure the subdir-named directory structure remains exactly the same when moving the blocks across the disks. For example, if the block replica and its meta pair were under /data/1/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/, and you wanted to move it to /data/5/ disk, then it MUST be moved into the same subdirectory structure underneath that, i.e. /data/5/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/. If this is not maintained, the DN will no longer be able to locate the replicas after the move.
Restart the DataNode.

However, this is not something that I recommend. A cleaner approach that you can take is decommission node, change the mount point and add it back to the cluster. I say cleaner because directly touching data directory can corrupt your data with a single misstep.

suraj_bose · ‎04-28-2016

Hi Ravi, Second approach sounds good to me. Is there a way to decommission node using Ambari? More detail in that approach would really help me

ravi1 · ‎04-28-2016

Yes. You can decommission node using ambari. https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Users_Guide/content/_how_to_decom...

Cloudera Community

Support Questions

how to change a disk used by a hadoop cluster.

3.12. On an individual data node, how do you balance the blocks on the disk?

3.12. On an individual data node, how do you balance the blocks on the disk?

Using the Hadoop Attack Library to Check Your Hado...

Using Apache NiFi for Slowly Changing Dimensions o...

Change ambari alert threshold values for disks

Using Rhive with Kerberized Hadoop Cluster

Multinode Hadoop Cluster Installation

How to change Ambari alert threshold values for di...

How to remove risk disks from Hadoop cluster ?

Mirroring Datasets Between Hadoop Clusters with Ap...

Using Hadoop Credential API to store AWS secrets

Setting up a Hadoop/Spark cluster with Docker on a...