Created 06-21-2023 07:26 AM
we are planning to separate cluster hosts into 2 physical racks, 10 hosts for each rack.
our doubt is:
as we only have two physical racks, is it still recommended to use the "rack" option in cloudera manager to separate the hosts?
Created 06-22-2023 06:34 AM
hi @yagoaparecidoti Yes, the Block for the file would be present on the Datanode in either of the rack. So if you write a file with RF 3, 1 block will be present on a rack and its 2 replicas will be present on different Datanodes of 2nd rack.
Created 06-21-2023 12:05 PM
Hi @yagoaparecidoti.
I'm not an expert but I checked documentation and found this:
Specifying Racks for Hosts
If I reading correctly you will have a total of twenty hosts across two racks so this statement would apply.
Cloudera Manager includes internal rack awareness scripts, but you must specify the racks where the hosts in your cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify the rack for each host. HDFS, MapReduce, and YARN will automatically use the racks you specify.
I hope this helps.
Created 06-21-2023 01:11 PM
hi @cjervis
I had already read this documentation before
thank you for your help.
i really wanted to know, if two racks are enough or the recommended one would be 3
Created 06-22-2023 02:27 AM
Hi, @yagoaparecidoti With a replication factor of 3, the BlockPlacementPolicyDefault will put one replica on the local machine if the writer is on a datanode, otherwise on a random datanode in the same rack as that of the writer, another replica on a node in a different (remote) rack, and the last on a different node in the same remote rack. So totally 2 racks will be used, in sceneraio like 2 racks going down at the same time will cause data inavailability where using BlockPlacementPolicyRackFaultTolerant will help in placing 3 blocks on 3 different racks.
So, you can safely set 2 racks. In case, you want to go with BlockPlacementPolicyRackFaultTolerant ( rare cases where both the racks go down ), you can follow the below doc :
Created 06-22-2023 05:56 AM
hi @rki_ ,
cool!
so, by my understanding;
if I separate the datanodes in the two physical racks, as shown in the image below, the blocks will be replicated to all datanodes in the two racks, right?
with that, if rack_01 goes down, I won't have any data loss, right?
Created 06-22-2023 06:34 AM
hi @yagoaparecidoti Yes, the Block for the file would be present on the Datanode in either of the rack. So if you write a file with RF 3, 1 block will be present on a rack and its 2 replicas will be present on different Datanodes of 2nd rack.
Created 06-22-2023 07:00 AM
Created 06-22-2023 07:44 AM
I forgot to ask,
today the cluster already has thousands of blocks in hdfs, more than 23 million blocks.
after configuring the rack in the cluster, the hdfs will recognize the racks and will start moving the blocks to the racks to increase the availability of the blocks or will i have to rebalance the hdfs?