Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Maximum capacity per DataNode

avatar
New Contributor

 Is there any upper limit for maximum capacity per node? Can data nodes scale to more than 100TB/node? 

5 REPLIES 5

avatar
Mentor
There are no limits in the source code implementation, if that is what you are asking. There are practical limits such as replication bandwidth (applied at loss) and reporting load (for low-latency operations) that you will run into when exceeding storage boundaries.

See also our Hardware Requirements guide: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...

avatar
New Contributor

Hi,

 

For a Data node with 100TB of size, how much RAM is required ??

avatar
Expert Contributor

That's mostly a function of blocks stored on a DataNode. For example, a rule of thumb is one GB heap size for DN for every one million blocks stored on that DN.

avatar
Mentor
Agreed. You shouldn't need more than 3-4 GiB of heap, going by an x3 or x4
factor of ideal block count for that storage (storage divided by block
size).

avatar
New Contributor

Can you provide more information on reporting load (for low-latency operations) issue when we have datanode with 100T+ storage? We need archive node for HDFS storage only purpose. No Yarn/spark running on it. It will only storage data based on storage migration policy. Node's network/storage IO bandwidth is considered be able to handle the larger storage size.