Created 05-11-2017 03:20 AM
Hi,
Can someone explain how the namenode heap size calculated in below URL?
Created 05-11-2017 11:53 AM
For 40-50 Millions files block size is 256 MB which is twice of 128 MB. Naturally the no of blocks created for a file is decreased in turn the this details stored in the name node is also reduced. That's why only 24 GB is recommended. If you further increase the block size to higher value then the recommended size would still decrease.
Its block size is indirectly proportional to the recommended size. I hope it would help.
Created 05-11-2017 04:40 AM
One namenode object uses about 150 bytes to store metadata information. Assume a 128 MB block size - you should increase the block size if you have lot of data (PB scale or even 500+ TB in some cases).
Assume a file size 150 MB. The file will be split in two blocks. First block with 128 MB and second block with 22MB. For this file following information will be stored by Namenode.
1 file inode and 2 blocks.
That is 3 namenode objects. They will take about 450 bytes on namenode. For example, at 1MB block size, in this case we will have 150 file blocks. We will have one inode and 150 blocks information in namenode. This means 151 namenode objects for same data. 151 x 150 bytes = 22650 bytes. Even worse would be to have 150 files with 1MB each. That would require 150 inodes and 150 blocks = 300 x 150 bytes = 45000 bytes. See how this all changes. That's why we don't recommend small files for Hadoop.
Now assuming 128 MB file blocks, on average 1GB of memory is required for 1 million blocks.
Now let's do this calculation at PB scale.
Assume 6000 TB of data. That's a lot.
Imagine 30 TB capacity for each node. This will require 200 nodes.
At 128 MB block size, and replication factor of 3.
Cluster capacity in MB = 30 x 1000 (convert to GB) x 1000 (convert to MB) x 200 nodes = 6 000000000 MB (6000 TB)
How many blocks can we store in this cluster?
6 000 000 000 MB/128 MB = 46875000 (that's 47 million blocks)
Assume 1 GB of memory required per million blocks, you need a mere 46875000 blocks / 1000000 blocks per GB = 46 GB of memory.
Namenodes with 64-128 GB memory are quite common. You can do a few things here.
1. Increase the block size to 256 MB and that will save you quite a bit of namenode space. At large scale, you should do that regardless.
2. Get more memory for name node. Probably 256 GB (Never had any customer go this far - may be someone else can chime in).
Finally, read the following.
https://issues.apache.org/jira/browse/HADOOP-1687
and in your link (notice for 40-50 million files only 24 GB is recommended - half of our calculations. Probably because block size assumed at that scale is 256 MB rather than 128 MB)
Created 11-21-2019 01:03 PM
Is this right? - 6 000 000 000 MB/128 MB = 46875000 (that's 47 million blocks)
It should be rather - 6 000 000 000 MB/(128*3) MB = 15625000 (that's approx 16 million blocks) and namenode memory required is approx 16G
Created on 11-22-2019 10:56 PM - edited 11-22-2019 11:05 PM
Hi @mqureshi ,
you have explained beautifully.
But how the replication of blocks will impact this calculation? Please explain.
Regards.
Created 05-11-2017 05:59 AM
Created 05-11-2017 11:53 AM
For 40-50 Millions files block size is 256 MB which is twice of 128 MB. Naturally the no of blocks created for a file is decreased in turn the this details stored in the name node is also reduced. That's why only 24 GB is recommended. If you further increase the block size to higher value then the recommended size would still decrease.
Its block size is indirectly proportional to the recommended size. I hope it would help.
Created 05-12-2017 03:41 AM
Thanks @mqureshi & @Bala Vignesh N V