Support Questions

Find answers, ask questions, and share your expertise

CDH namenode java heap bigger than it should be

avatar
New Contributor

I am running CDH 6.3.2.

I set the namenode java heap size to 70GB where almost 9GB used, but I got the block infomation as below:

1,757,092 files and directories, 1,340,670 blocks (1,340,670 replicated blocks, 0 erasure coded block groups) = 3,097,762 total filesystem object(s).

Heap Memory used 8.74 GB of 69.65 GB Heap Memory. Max Heap Memory is 69.65 GB.

Non Heap Memory used 130.75 MB of 132.38 MB Commited Non Heap Memory. Max Non Heap Memory is <unbounded>.

at first, the value of heap memory used is changing between 1GB to 2GB, but the value changes between 6GB to 9 GB after a few days.I think it should be 3GB at most. 

Can anyone help me to figure it out? Thanks very much.

2.png

plus: I didnt find any snapshot objects.

1.png

2 REPLIES 2

avatar
Expert Contributor

Hi @ROACH 

 

Ideally we recommend 1gb heap per 1 million blocks.  
Also , How much memory you actually need depends on your workload, especially on the number of files, directories, and blocks generated in each namespace. Type of hardware VM or bare metal etc also taken into account.

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_nn_memory_config.html

 

Also have a look at examples of estimating namenode heap memory

https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/hdfs-overview/topics/hdfs-examples-namenode-h...

 

If write intensive operations or snapshots operations are being performed on Cluster oftenly then 6-9 gb sounds fine. I would suggest to do grep for GC in Namenode logs and if you see long Pauses says for more than 3-5 seconds then its a good starting point to increase the heap size.

 

Does that answer your question. do let us know.

 

regards,

 

 

 

avatar
New Contributor

Dear @kingpin 

Thanks for your reply, I do have a sparkstreaming proccess for storing files to hdfs from kafka every two minutes. You can refer the screenshot below for the data volume.

The java heap is increasing every day. I think it will be over 70GB after 1 month, but the blocks is still less than 2 million.

Is there anyway to clean cache of javaheap? 

The memory of java heap will be normal after rebooting.

sparkstreaming of car @kingpin 

 

Thanks for your reply, I do have a sparkstreaming proccess for storing files to hdfs from kafka every two minutes. You can refer the screenshot below for the data volume.

 

The java heap is increasing every day. I think it will be over 70GB after 1 month, but the blocks is still less than 2 million.

 

Is there anyway to clean cache of javaheap? 

 

The memory of java heap will be normal after rebooting.

 

sparkstreaming of consuming kakfa:

sparkstreaming.png