Support Questions

Find answers, ask questions, and share your expertise

rack awareness configuration on cloudera 6.3.x cluster

avatar
Expert Contributor

we are planning to separate cluster hosts into 2 physical racks, 10 hosts for each rack.

 

our doubt is:

 

as we only have two physical racks, is it still recommended to use the "rack" option in cloudera manager to separate the hosts?

 

yagoaparecidoti_0-1687357496044.png

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

hi @yagoaparecidoti Yes, the Block for the file would be present on the Datanode in either of the rack. So if you write a file with RF 3, 1 block will be present on a rack and its 2 replicas will be present on different Datanodes of 2nd rack.

View solution in original post

7 REPLIES 7

avatar
Community Manager

Hi @yagoaparecidoti

I'm not an expert but I checked documentation and found this:
Specifying Racks for Hosts


If I reading correctly you will have a total of twenty hosts across two racks so this statement would apply. 

Cloudera Manager includes internal rack awareness scripts, but you must specify the racks where the hosts in your cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify the rack for each host. HDFS, MapReduce, and YARN will automatically use the racks you specify.

I hope this helps.


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Expert Contributor

hi @cjervis 

I had already read this documentation before

 

thank you for your help.

 

i really wanted to know, if two racks are enough or the recommended one would be 3

avatar
Super Collaborator

Hi, @yagoaparecidoti With a replication factor of 3, the BlockPlacementPolicyDefault will put one replica on the local machine if the writer is on a datanode, otherwise on a random datanode in the same rack as that of the writer, another replica on a node in a different (remote) rack, and the last on a different node in the same remote rack. So totally 2 racks will be used, in sceneraio like 2 racks going down at the same time will cause data inavailability where using BlockPlacementPolicyRackFaultTolerant will help in placing 3 blocks on 3 different racks.

 

So, you can safely set 2 racks. In case, you want to go with BlockPlacementPolicyRackFaultTolerant ( rare cases where both the racks go down ), you can follow the below doc :

 

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsBlockPlacementPolicies.htm...

avatar
Expert Contributor

hi @rki_ , 

 

cool!

 

so, by my understanding;

 

if I separate the datanodes in the two physical racks, as shown in the image below, the blocks will be replicated to all datanodes in the two racks, right?

 

yagoaparecidoti_0-1687438562761.png

 

with that, if rack_01 goes down, I won't have any data loss, right?

avatar
Super Collaborator

hi @yagoaparecidoti Yes, the Block for the file would be present on the Datanode in either of the rack. So if you write a file with RF 3, 1 block will be present on a rack and its 2 replicas will be present on different Datanodes of 2nd rack.

avatar
Expert Contributor

hi @rki_ 


excellent, thanks a lot for the help.


hugs.


have a nice day!

avatar
Expert Contributor

hi @rki_ / @cjervis , 

 

I forgot to ask,

 

today the cluster already has thousands of blocks in hdfs, more than 23 million blocks.

 

after configuring the rack in the cluster, the hdfs will recognize the racks and will start moving the blocks to the racks to increase the availability of the blocks or will i have to rebalance the hdfs?