Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Any recommendation on how to partition disk space for a Namenode?

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Expert Contributor

@hfaouaz@hortonworks.com - each HDFS block occupies ~250 bytes of RAM on NameNode (NN), plus an additional ~250 bytes will be required for each file and directory. Block size by default is 128 MB so you can do the calculation pertaining to how much RAM will support how many files. To guarantee persistence of the filesystem metadata the NN has to keep a copy of its memory structures on disk also the NN dirs as you mentioned and they will hold the fsimage and editlogs. Editlogs captures all changes that are happening to HDFS (such as new files and directories), think redo logs that most RDBM's use. The fsimage is a full snapshot of the metadata state. The fsimage file will not grow beyond the allocated NN memory set and the edit logs will get rotated once it hits a specific size. It always safest to allocate significantly more capacity for NN directory then needed say 4 times what is configured for NN memory, but if disk capacity isn't and issue allocate 500 GB+ if can spare (more capacity is very common especially when setting up a 3+3 or 4+4 RAID 10 mirrored set). Setting up RAID at the disk level like RAID1 or RAID 1/0 makes sense and thus having RAID allows for a single directory to be just fine.

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@hfaouaz@hortonworks

Please see this blog

link

ext4 is the successor to ext3. ext4 has better performance with large files. ext4 also introduced delayed allocation of data, which adds a bit more risk with unplanned server outages while decreasing fragmentation and improving performance.

avatar
Rising Star

this is great blog. what about the partition size for the namedirs? multiple namedirs?

avatar
Expert Contributor

@hfaouaz@hortonworks.com - each HDFS block occupies ~250 bytes of RAM on NameNode (NN), plus an additional ~250 bytes will be required for each file and directory. Block size by default is 128 MB so you can do the calculation pertaining to how much RAM will support how many files. To guarantee persistence of the filesystem metadata the NN has to keep a copy of its memory structures on disk also the NN dirs as you mentioned and they will hold the fsimage and editlogs. Editlogs captures all changes that are happening to HDFS (such as new files and directories), think redo logs that most RDBM's use. The fsimage is a full snapshot of the metadata state. The fsimage file will not grow beyond the allocated NN memory set and the edit logs will get rotated once it hits a specific size. It always safest to allocate significantly more capacity for NN directory then needed say 4 times what is configured for NN memory, but if disk capacity isn't and issue allocate 500 GB+ if can spare (more capacity is very common especially when setting up a 3+3 or 4+4 RAID 10 mirrored set). Setting up RAID at the disk level like RAID1 or RAID 1/0 makes sense and thus having RAID allows for a single directory to be just fine.