Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS Datanode Replication Policy on multiple disk.

avatar
Rising Star

System : HDP 2.4.2.0, AMBARI, 2.2.2.0

Machine : 5 Datanode Servers in 10 Servers,

Datanode Server Disk Volume Info : 3TB x 10

Datanode Directory in Ambari web :

DataNode directories (Just only one Disk1 (/data1) consist of two datanode directory per Datanode Server.

Disk Volume1 : /data1/hadoop/hdfs/data, /data1/crash/hadoop/hdfs/data

-> I don't know why it's added crash directory in only Disk1 Volume (/data1)

Disk Volume2 : /data1/hadoop/hdfs/data

....

Disk Volume5 : /data1/hadoop/hdfs/data

/data1/crash/hadoop/hdfs/data

Here's my question.

Q1. When I put large size data in HDFS's certain directory (/dataset), I'm wondering hdfs replication policy each per server's disks.

ex. dfs replication count = 3, certain 10GB File in HDFS's /dataset

That File is separate replication architecture in disk volume per datanodes.

Case1. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode8 - disk2 | Datanode3 - disk6 ......

Case2. BlockPool - blk_,,,,,, blk_...meta -> Datnode1 - disk1 | Datnode2 - disk1 | Datanode7 - disk1 ...... | Datanode$ - disk1

Which one is correct hadoop hdfs replication & distribution store data manage policy.

Is that possible Case2?

Q2. How to move safely one datanode directory (/data1/crash/hadoop/hdfs/data) to another Disk(2~5) Volumes in same datanode directory or another datanode directory disk(2~5).

Because It's bring the disk full issue only disk1 (data1) faster than another disks.

In case Disk1's /data directory, up to double the block data & meta files is stored.

So I need to know the solution of datanode directory's (/data1/crash/hadoop/hdfs/data) data before remove this path /data1/crash/hadoop/hdfs/data in Amabri - HDFS - Configs - Settings -Datanode directories.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Expert Contributor

@Peter Kim

A1) HDFS block placement by default works in a round-robin fashion across all the directories specified by dfs.data.dir.

A2) Its not safe to move one directory contents to another. There is no feature to do intra disk balancing at this moment. The safest way to accomplish this would be to follow the process of de-commissioning the datanode, let Hadoop handle replicating the missing blocks across the cluster, format the drives on the de-commissioned nodes and then recommission back the datanode. Make sure you run the balancer utility in the back end to distribute the blocks evenly across the cluster.

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login