Support Questions

Find answers, ask questions, and share your expertise

HDFS Datanode Replication Policy on multiple disk.

avatar
Rising Star

System : HDP 2.4.2.0, AMBARI, 2.2.2.0

Machine : 5 Datanode Servers in 10 Servers,

Datanode Server Disk Volume Info : 3TB x 10

Datanode Directory in Ambari web :

DataNode directories (Just only one Disk1 (/data1) consist of two datanode directory per Datanode Server.

Disk Volume1 : /data1/hadoop/hdfs/data, /data1/crash/hadoop/hdfs/data

-> I don't know why it's added crash directory in only Disk1 Volume (/data1)

Disk Volume2 : /data1/hadoop/hdfs/data

....

Disk Volume5 : /data1/hadoop/hdfs/data

/data1/crash/hadoop/hdfs/data

Here's my question.

Q1. When I put large size data in HDFS's certain directory (/dataset), I'm wondering hdfs replication policy each per server's disks.

ex. dfs replication count = 3, certain 10GB File in HDFS's /dataset

That File is separate replication architecture in disk volume per datanodes.

Case1. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode8 - disk2 | Datanode3 - disk6 ......

Case2. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode2 - disk1 | Datanode7 - disk1 ...... | Datanode$ - disk1

Which one is correct hadoop hdfs replication & distribution store data manage policy.

Is that possible Case2?

Q2. How to move safely one datanode directory (/data1/crash/hadoop/hdfs/data) to another Disk(2~5) Volumes in same datanode directory or another datanode directory disk(2~5).

Because It's bring the disk full issue only disk1 (data1) faster than another disks.

In case Disk1's /data directory, up to double the block data & meta files is stored.

So I need to know the solution of datanode directory's (/data1/crash/hadoop/hdfs/data) data before remove this path /data1/crash/hadoop/hdfs/data in Amabri - HDFS - Configs - Settings -Datanode directories.

1 ACCEPTED SOLUTION

avatar

Hi @Peter Kim,

The NameNode selects a set of DataNodes for placing replicas of a newly allocated block. Each DataNode independently selects the target disk for its replica using a round robin policy. So replica placement looks like your case 1. i.e.

Case1. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode8 - disk2 | Datanode3 - disk6 ......

There is no good way to redistribute blocks across disks on a DN as @Hari Rongali mentioned. However a Disk Balancer feature is under development to address this use case.

Also if I understand correctly you have two DN storage directories on one physical volume. We do not recommend doing that as it will affect your performance. You should have a one-one relation between storage directories and physical volumes (assuming you are using disks in the recommended JBOD configuration)

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

@Peter Kim

A1) HDFS block placement by default works in a round-robin fashion across all the directories specified by dfs.data.dir.

A2) Its not safe to move one directory contents to another. There is no feature to do intra disk balancing at this moment. The safest way to accomplish this would be to follow the process of de-commissioning the datanode, let Hadoop handle replicating the missing blocks across the cluster, format the drives on the de-commissioned nodes and then recommission back the datanode. Make sure you run the balancer utility in the back end to distribute the blocks evenly across the cluster.

avatar

Hi @Peter Kim,

The NameNode selects a set of DataNodes for placing replicas of a newly allocated block. Each DataNode independently selects the target disk for its replica using a round robin policy. So replica placement looks like your case 1. i.e.

Case1. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode8 - disk2 | Datanode3 - disk6 ......

There is no good way to redistribute blocks across disks on a DN as @Hari Rongali mentioned. However a Disk Balancer feature is under development to address this use case.

Also if I understand correctly you have two DN storage directories on one physical volume. We do not recommend doing that as it will affect your performance. You should have a one-one relation between storage directories and physical volumes (assuming you are using disks in the recommended JBOD configuration)