Support Questions

Find answers, ask questions, and share your expertise

hdfs block capacity chart

avatar
Contributor

Hi! I'd like to ask what is the actual meaning of the 'Block Capacity' chart for HDFS. The tsquery expression is known: 'SELECT block_capacity ...', but can't figure out where block_capacity is retrived from or how it is calculated.

 

The question arose after we have increased the 'Java Heap Size of NameNode in Bytes' for our Namenodes - we'd expect the block_capacity of the cluster would also grow, but actually it remains the same. Maybe we'd need to restart cluster to get this parameter updated?

1 ACCEPTED SOLUTION

avatar
Rising Star

As far as I understand, Block Capacity means the total number of blocks HDFS can hold, irrespective of the size. For example, a file of 128MB size will consume 1 HDFS block (assuming HDFS block size is set to 128MB) from a Data Node perspective, but on the NameNode, it needs 2 namespace objects (1 for file inode and 1 block).

 

Since all that is stored in memory, the block capacity should increave after increasing the heap size of namenode. Yes, you will have to restart HDFS and dependent services to see the increased capacity. However, it might take some time for it to reflect...

 

 

View solution in original post

4 REPLIES 4

avatar
Rising Star

As far as I understand, Block Capacity means the total number of blocks HDFS can hold, irrespective of the size. For example, a file of 128MB size will consume 1 HDFS block (assuming HDFS block size is set to 128MB) from a Data Node perspective, but on the NameNode, it needs 2 namespace objects (1 for file inode and 1 block).

 

Since all that is stored in memory, the block capacity should increave after increasing the heap size of namenode. Yes, you will have to restart HDFS and dependent services to see the increased capacity. However, it might take some time for it to reflect...

 

 

avatar
Contributor
Thanks a million for your explanations. Indeed, the max block capacity cap was updated, but only after I changed Namenode's Java Heapsize for the second time. Anyway, now it shows higher cluster capacity, but actual value is a more than 2 times higher that what we expected assigning 1GB of RAM per 1 million of blocks. That's why I'd like to see how this value is actually calculated. Can't find it in CM.

avatar
Rising Star

I am glad it's showing the increased values now. The following link might help, if not already referred to:

 

https://www.cloudera.com/documentation/enterprise/5-12-x/topics/admin_nn_memory_config.html

 

avatar
Contributor
Yep, we used it for estimating required memory. For now our block capacity is ok, thank you for help.