Support Questions

yagoaparecidoti · ‎06-21-2023

we are planning to separate cluster hosts into 2 physical racks, 10 hosts for each rack.

our doubt is:

as we only have two physical racks, is it still recommended to use the "rack" option in cloudera manager to separate the hosts?

rki_ · ‎06-22-2023

hi @yagoaparecidoti Yes, the Block for the file would be present on the Datanode in either of the rack. So if you write a file with RF 3, 1 block will be present on a rack and its 2 replicas will be present on different Datanodes of 2nd rack.

View solution in original post

cjervis · ‎06-21-2023

Hi @yagoaparecidoti.

I'm not an expert but I checked documentation and found this:
Specifying Racks for Hosts

If I reading correctly you will have a total of twenty hosts across two racks so this statement would apply.

Cloudera Manager includes internal rack awareness scripts, but you must specify the racks where the hosts in your cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify the rack for each host. HDFS, MapReduce, and YARN will automatically use the racks you specify.

I hope this helps.

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

yagoaparecidoti · ‎06-21-2023

hi @cjervis

I had already read this documentation before

thank you for your help.

i really wanted to know, if two racks are enough or the recommended one would be 3

rki_ · ‎06-22-2023

Hi, @yagoaparecidoti With a replication factor of 3, the BlockPlacementPolicyDefault will put one replica on the local machine if the writer is on a datanode, otherwise on a random datanode in the same rack as that of the writer, another replica on a node in a different (remote) rack, and the last on a different node in the same remote rack. So totally 2 racks will be used, in sceneraio like 2 racks going down at the same time will cause data inavailability where using BlockPlacementPolicyRackFaultTolerant will help in placing 3 blocks on 3 different racks.

So, you can safely set 2 racks. In case, you want to go with BlockPlacementPolicyRackFaultTolerant ( rare cases where both the racks go down ), you can follow the below doc :

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsBlockPlacementPolicies.htm...

yagoaparecidoti · ‎06-22-2023

hi @rki_ ,

cool!

so, by my understanding;

if I separate the datanodes in the two physical racks, as shown in the image below, the blocks will be replicated to all datanodes in the two racks, right?

with that, if rack_01 goes down, I won't have any data loss, right?

rki_ · ‎06-22-2023

hi @yagoaparecidoti Yes, the Block for the file would be present on the Datanode in either of the rack. So if you write a file with RF 3, 1 block will be present on a rack and its 2 replicas will be present on different Datanodes of 2nd rack.

yagoaparecidoti · ‎06-22-2023

hi @rki_

excellent, thanks a lot for the help.

hugs.

have a nice day!

yagoaparecidoti · ‎06-22-2023

hi @rki_ / @cjervis ,

I forgot to ask,

today the cluster already has thousands of blocks in hdfs, more than 23 million blocks.

after configuring the rack in the cluster, the hdfs will recognize the racks and will start moving the blocks to the racks to increase the availability of the blocks or will i have to rebalance the hdfs?

Cloudera Community

Support Questions

rack awareness configuration on cloudera 6.3.x cluster