Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is it best to increase the disk or datanode to address space consumption issue?

avatar
Expert Contributor

Hello All,

We have 12 disk configured on each data node of 7.3 TB disk size of each, so that makes up the total space on each data node 87.6 TB. 

Disk size on each data node:

/dev/sdb1 7.3T /data1
/dev/sdc1 7.3T /data2
/dev/sdd1 7.3T /data3
/dev/sde1 7.3T /data4
/dev/sdf1 7.3T /data5
/dev/sdg1 7.3T /data6
/dev/sdh1 7.3T /data7
/dev/sdi1 7.3T /data8
/dev/sdj1 7.3T /data9
/dev/sdk1 7.3T /data10
/dev/sdl1 7.3T /data11
/dev/sdm1 7.3T /data12

Due to growth in data we have already come close to the threshold value on each data node. Hence need suggestion/best practice to know is it better to add more disk of the same size or should we increase data nodes. considering we don't compromise with the load on the servers. So if adding nodes would be a better option then we will go for that.

We are currently on CDP 7 and I want to ensure we follow the best practice so that should accommodate our space concerns in the long run.

Kindly share if any one has to come to this situation and what would be the best approach in this case.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

After a thorough research on the space capacity and best practice.  I have the recommendations to share and can conclude this as resolved.

As a general recommendation, it is suggested not exceeding the data node DFS configured capacity for more than 100TB. Since our current setup is already at 87TB, It is recommended for scaling the cluster horizontally by adding more data nodes with homogeneous configurations. This way we can add more compute as well as additional I/O capacity to the cluster.

 

View solution in original post

1 REPLY 1

avatar
Expert Contributor

After a thorough research on the space capacity and best practice.  I have the recommendations to share and can conclude this as resolved.

As a general recommendation, it is suggested not exceeding the data node DFS configured capacity for more than 100TB. Since our current setup is already at 87TB, It is recommended for scaling the cluster horizontally by adding more data nodes with homogeneous configurations. This way we can add more compute as well as additional I/O capacity to the cluster.