Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Namenode heapsize

Namenode heapsize

New Contributor

What are the parameters that will be affected if Namenode heap size is changed? So what other components need to be looked for capacity planning?

4 REPLIES 4

Re: Namenode heapsize

Re: Namenode heapsize

Contributor

Are you going to be increasing or decreasing the heapsize? If you increase it too high, then the NameNode(s) may fail to even start if they cannot obtain the correct resources due to other services already using them.

Highlighted

Re: Namenode heapsize

Rising Star

@himanshu ghildiyal

NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system.

Enviornment Variables used:

HADOOP_HEAPSIZE - sets the JVM heap size for all Hadoop project servers such as HDFS, YARN, and MapReduce.

HADOOP_NAMENODE_OPTS - is specific to the NameNode and sets all JVM flags, which must be specified.

The following table illustrates estimating NameNode Heap Memory Used

Scenario1, Scenario2, and Scenario3 each have 1 GB (1024 MB) of data on disk, but sliced into differently sized files. Scenario1 and Scenario 2 have files that are some integral of the block size and require the least memory. Scenario3 does not and fills the heap with unnecessary namespace objects.

Scenario-1 Scenario-2 Scenario-3
File Size 1 GB (1024 MB) 1 GB (1024 MB) 1 GB (1024 MB)
File Split 1 * 1024 MB 8 * 128 MB 1024 * 1MB
Blocks 8

(1024 MB /128 MB)

8

(1024 MB /128 MB)

1024

(1024 MB /1 MB)

Total Objects 8+1 = 9 8+8 = 16 1024 + 1024 = 2048
Total Heap Memory 9 objects * 150 bytes = 1,350 bytes 16 objects * 150 bytes = 2,400 bytes 2,048 objects * 150 bytes = 307,200 bytes

The following table illustrates sstimating NameNode Heap Memory Needed

In this example, memory is estimated by considering the capacity of a cluster. Values are rounded. Both clusters physically store 4800 TB, or approximately 36 million block files (at the default block size). Replication factor determines how many namespace blocks represent these block files.

Cluster-A Cluster-B
Cluster Size 200 hosts of 24 TB each = 4800 TB. 200 hosts of 24 TB each = 4800 TB.
Replication Factor 1 3
Block Size 128 MB 128 MB
Cluster capacity in MB 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB) 200 * 24,000,000 MB = 4,800,000,000 MB (4800 TB)
Block Storage per block 128 MB * 1 = 128 MB 128 MB * 3 = 384 MB
Cluster Capacity In blocks 4,800,000,000 MB / 128 MB = 36,000,000 blocks 4,800,000,000 MB / 384 MB = 12,000,000 blocks

At capacity, with the recommended allocation of 1 GB of memory per million blocks, Cluster B needs 12 GB of maximum heap space.

Both Cluster A and Cluster B store the same number of block files. In Cluster A, however, each block file is unique and represented by one block on the NameNode, in Cluster B, only one-third are unique and two-thirds are replicas.

Re: Namenode heapsize

New Contributor

Hi ,

My question is namenode heap memory has been changed. Now what other parameters need to be changed for HDFS/Mapreduce etc.