Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Maximum capacity per DataNode

Maximum capacity per DataNode

New Contributor

 Is there any upper limit for maximum capacity per node? Can data nodes scale to more than 100TB/node? 

5 REPLIES 5

Re: Maximum capacity per DataNode

Master Guru
There are no limits in the source code implementation, if that is what you are asking. There are practical limits such as replication bandwidth (applied at loss) and reporting load (for low-latency operations) that you will run into when exceeding storage boundaries.

See also our Hardware Requirements guide: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...

Re: Maximum capacity per DataNode

New Contributor

Hi,

 

For a Data node with 100TB of size, how much RAM is required ??

Re: Maximum capacity per DataNode

Rising Star

That's mostly a function of blocks stored on a DataNode. For example, a rule of thumb is one GB heap size for DN for every one million blocks stored on that DN.

Highlighted

Re: Maximum capacity per DataNode

Master Guru
Agreed. You shouldn't need more than 3-4 GiB of heap, going by an x3 or x4
factor of ideal block count for that storage (storage divided by block
size).

Re: Maximum capacity per DataNode

New Contributor

Can you provide more information on reporting load (for low-latency operations) issue when we have datanode with 100T+ storage? We need archive node for HDFS storage only purpose. No Yarn/spark running on it. It will only storage data based on storage migration policy. Node's network/storage IO bandwidth is considered be able to handle the larger storage size.