Support Questions

Find answers, ask questions, and share your expertise

how to change a disk used by a hadoop cluster.

avatar
Explorer

Our cluster is running on hdp 2.3.4.0 and one of the host is showing disk (/dev/sda1) usage 99% (see the attachment) where as there is enough disk space available in /dev/sdb1. By default ambari selected (I don't know how) /dev/sda1 during hadoop cluster setup. Can I somehow change the disk from /dev/sda1 to /dev/sdb1 without disturbing/loosing any data from the cluster? If not what is the best alternative. Please suggests.

1 ACCEPTED SOLUTION

avatar
Guru

From hadoop FAQ on apache,

3.12. On an individual data node, how do you balance the blocks on the disk?

Hadoop currently does not have a method by which to do this automatically. To do this manually:

  1. Shutdown the DataNode involved
  2. Use the UNIX mv command to move the individual block replica and meta pairs from one directory to another on the selected host. On releases which have HDFS-6482 (Apache Hadoop 2.6.0+) you also need to ensure the subdir-named directory structure remains exactly the same when moving the blocks across the disks. For example, if the block replica and its meta pair were under /data/1/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/, and you wanted to move it to /data/5/ disk, then it MUST be moved into the same subdirectory structure underneath that, i.e. /data/5/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/. If this is not maintained, the DN will no longer be able to locate the replicas after the move.
  3. Restart the DataNode.

However, this is not something that I recommend. A cleaner approach that you can take is decommission node, change the mount point and add it back to the cluster. I say cleaner because directly touching data directory can corrupt your data with a single misstep.

View solution in original post

4 REPLIES 4

avatar
Explorer

avatar
Guru

From hadoop FAQ on apache,

3.12. On an individual data node, how do you balance the blocks on the disk?

Hadoop currently does not have a method by which to do this automatically. To do this manually:

  1. Shutdown the DataNode involved
  2. Use the UNIX mv command to move the individual block replica and meta pairs from one directory to another on the selected host. On releases which have HDFS-6482 (Apache Hadoop 2.6.0+) you also need to ensure the subdir-named directory structure remains exactly the same when moving the blocks across the disks. For example, if the block replica and its meta pair were under /data/1/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/, and you wanted to move it to /data/5/ disk, then it MUST be moved into the same subdirectory structure underneath that, i.e. /data/5/dfs/dn/current/BP-1788246909-172.23.1.202-1412278461680/current/finalized/subdir0/subdir1/. If this is not maintained, the DN will no longer be able to locate the replicas after the move.
  3. Restart the DataNode.

However, this is not something that I recommend. A cleaner approach that you can take is decommission node, change the mount point and add it back to the cluster. I say cleaner because directly touching data directory can corrupt your data with a single misstep.

avatar
Explorer

Hi Ravi, Second approach sounds good to me. Is there a way to decommission node using Ambari? More detail in that approach would really help me

avatar
Guru