Created 11-06-2015 07:34 PM
A partner is working to offer Hadoop in Private cloud. Working with them on the sizing for their master nodes. A question came up is about the size of a fsimage and edit log files in a typical small/medium and large customer implementation of Hortonworks.
Based on this, partner can produre sufficient storage for master nodes
Any experiences to share about file sizes?
Created 11-06-2015 07:40 PM
Similar question here
Answer I copied below.
each HDFS block occupies ~250 bytes of RAM on NameNode (NN), plus an additional ~250 bytes will be required for each file and directory. Block size by default is 128 MB so you can do the calculation pertaining to how much RAM will support how many files. To guarantee persistence of the filesystem metadata the NN has to keep a copy of its memory structures on disk also the NN dirs and they will hold the fsimage and editlogs. Editlogs captures all changes that are happening to HDFS (such as new files and directories), think redo logs that most RDBM's use. The fsimage is a full snapshot of the metadata state. The fsimage file will not grow beyond the allocated NN memory set and the edit logs will get rotated once it hits a specific size. It always safest to allocate significantly more capacity for NN directory then needed example say 4 times what is configured for NN memory, but if disk capacity isn't an issue allocate 500 GB+ if can spare (more capacity is very common especially when setting up a 3+3 or 4+4 RAID 10 mirrored set). Setting up RAID at the disk level like RAID1 or RAID 1/0 makes sense and thus having RAID allows for a single directory to be just fine.
Created 11-06-2015 07:40 PM
Similar question here
Answer I copied below.
each HDFS block occupies ~250 bytes of RAM on NameNode (NN), plus an additional ~250 bytes will be required for each file and directory. Block size by default is 128 MB so you can do the calculation pertaining to how much RAM will support how many files. To guarantee persistence of the filesystem metadata the NN has to keep a copy of its memory structures on disk also the NN dirs and they will hold the fsimage and editlogs. Editlogs captures all changes that are happening to HDFS (such as new files and directories), think redo logs that most RDBM's use. The fsimage is a full snapshot of the metadata state. The fsimage file will not grow beyond the allocated NN memory set and the edit logs will get rotated once it hits a specific size. It always safest to allocate significantly more capacity for NN directory then needed example say 4 times what is configured for NN memory, but if disk capacity isn't an issue allocate 500 GB+ if can spare (more capacity is very common especially when setting up a 3+3 or 4+4 RAID 10 mirrored set). Setting up RAID at the disk level like RAID1 or RAID 1/0 makes sense and thus having RAID allows for a single directory to be just fine.
Created 11-06-2015 08:05 PM
Dan, partner has limitations on storage capacity (currently they have SDDs with 400 GB). And the SDDs will be JBOD and not in RAID.