Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Data Node Pause Duration

avatar
Rising Star

Hello,


On our data node, we are increasing getting alerts related to Data Node Pause Duration. So far, this is happening on a single data node out of nine data nodes.

Following is the error captured from DN logs 

2020-10-27 16:20:05,140 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1821ms GC pool 'ParNew' had collection(s): count=1 time=2075ms)

 

Current Java Heap Size of Data Node in Bytes is at 6GB

 

CM / CDH – 5.16.x

 

Any help is appreciated.

 

Regards

Amn

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello @Amn_468 

Since you reported the DN Pause time, I spoke/referred about DN heap only.  The block counts on most of the DN seems >6Millions, hence would suggest to increase the DN heap to 8GB (from current value of 6GB) and perorm a rolling restart to bring the new heap size into effect. 

 

There is no straight forward way to say you hit the small file problem but if your average block size is few MB or less than a MB in size, it is an indication that you are storing/accumulating small files in HDFS.  Simplest way to determine small files in cluster is to run fsck.

 

Fsck should show the average block size. If it's too low a value (eg ~ 1MB ), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks.

 

[..]

$ hdfs fsck /

..

...

Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<<

[..]

 

You may refer belwo links for your help on dealing with small files.

https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-c...

https://community.cloudera.com/t5/Community-Articles/Identify-where-most-of-the-small-file-are-locat... 

View solution in original post

9 REPLIES 9

avatar
Master Guru

@Amn_468 This is due to the Java Heap Size. 

Let's say the default setting for the namenode_java_heapsize is 1GB. Cloudera recommends having 1GB of heap space for every 1M blocks in a cluster.
 

If the data in your cluster is growing rapidly, factor in the potential future number of blocks your cluster will require when determining the size setting, so you can avoid having to restart the namenode.  It is only possible to change the setting by restarting the namenode.

Calculating the Required Heap Size

  1. Determine the number of blocks in the cluster. This information is available on the namenode web UI under the Summary section, with information like the following:
    117,387 files and directories, 56,875 blocks = 174,262 total filesystem object(s).
    Alternatively, the information is available from the output of the fsck command:
    • Total size:    9958827546 B (Total open files size: 93 B)
       Total dirs:    20397
       Total files:    57993
       Total symlinks:        0 (Files currently being written: 1)
       Total blocks (validated):    56874 (avg. block size 175103 B) (Total open file blocks (not validated): 1)
       ...
Given the number of blocks, allocate 1GB of heap space for each 1M blocks, plus some additional memory for growth. For example, if there are 6,543,567 blocks, you need 6.5GB of heap to cover the current cluster size, but 8GB would be a sensible setting to allow for growth of the cluster.
 
After that you can adjust the Java Heap Size for NN. Hope this helps. 

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Master Mentor

@Amn_468 

Increasing the Java Heap Size for the NameNode and Secondary NameNode Services,you could be using the default 1GB setting for heap size

As a general rule of thumb take a look at the configuration of your Heap Sizes for every 1 Million Blocks in your cluster should have at least 1GB of Heap Size.

  • 2 Million Blocks 2GB heap size
  • 3 Million Blocks 3GB heap size
    .....
  • n Million Blocks n GB heap size

After increasing the Java Heap Size and restart the HDFS Services that should resolve the issue.

 

Please revert

avatar
Rising Star

Hello @GangWar @Shelton 

 

Appericate your assistance,

Following is the information available from NN WebUI- (23,326,719 files and directories, 22,735,340 blocks = 46,062,059 total filesystem object(s).

Heap Memory used 5.47 GB of 10.6 GB Heap Memory. Max Heap Memory is 10.6 GB.

Non Heap Memory used 120.51 MB of 122.7 MB Commited Non Heap Memory. Max Non Heap Memory is <unbounded>.)

 

Could you please re-confirm whether I need to adjust the NN Heap Memory OR DN heap memory, as the issue is seen on data Node and that too only one data node other 8 seem to be running without any issues.

 

Thanks 

Amn

avatar
Master Mentor

@Amn_468 

 

The NameNode is solely responsible for the Cluster Metadata so please increase the NN heap size and restart the services.
Please revert 

 

avatar
Rising Star

@Shelton 

 

Apologies for the delay in replying. For my understanding, if possible, would you please explain how increasing NN Heap would fix DN Pause duration.

 

Thanks in advance

Amn

avatar
Master Mentor

@Amn_468 

The Namenode is the brain of the cluster, it has the footprint of the cluster location of the files, ACL's, stores the HDFS metadata, the directory tree of all files in the file system, and tracks the files across the cluster and does not store the actual data or the dataset. The data itself is actually stored in the Datanodes.

Your error

 

2020-10-27 16:20:05,140 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1821ms GC pool 'ParNew' had collection(s): count=1 time=2075ms)

 

This indicates that the NameNode paused for longer than the expected time of 60000ms. This also explains why DataNode did not get a response from NameNode in designated 60000ms.

 

The warning also indicates that the pause was due to GC which calls for a memory and GC Tuning.

NameNode knows the location, list of the blocks with this information NameNode knows how to construct the file from blocks. The fastest way to render this information is to store it in memory that's the reason the NN is usually on a high-end server configured with a lot of memory (RAM). because the block locations are stored in RAM

 

An ideal starter config in production for a datanode and Namende would be

Name Node Configuration

 

Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 128 GB
Disk: 6 x 1TB SATA
Network: 10 Gigabit Ethernet

 

Data Node Configuration

 

Processors: 2 Quad Core CPUs running @ 2 GHz
RAM: 64 GB
Disk: 12-24 x 1TB SATA
Network: 10 Gigabit Ethernet

 

A fundamental parameter to tune for garbage collectors is the number of HDFS blocks stored in the Hadoop cluster in your case 23,326,719 files. The number of files, and associated blocks, is a fundamental parameter in the tuning process. The Namenode maintains the complete directory structure in memory. Therefore, more files mean more objects to manage. Most of the time, Hadoop clusters are configured without knowledge of the final workload in terms of the number of files that will be stored. Having in mind the strong connection between these two aspects is crucial to anticipate future turbulence in the hdfs quality of service.

 

You should analyze log prints produced by the garbage collector the gc.log files found in the Namenode logs directory the available memory is filling up before the garbage collector activity is able to release it.

Hope that helps

avatar
Expert Contributor

Hello @Amn_468 The DN Pause alert you see for 1/9 DataNodes are indication of growing blocks on it.

Compared to other DNs, possibly this DN in question have stored more number of blocks than other nodes. You may compare the block counts of each DN in HDFS > HDFS > WebUI > Active NN Web UI > DataNodes > Cehck the blocks column under section "In Operation".

 

The log snippet you shared indicates a pause of 2sec only, which is not sign of worry. However, with proper JVM heap size allocated for DN, you may avoid these frequent pause alerts.

 

As a thumb rule you may need 1GB heap for 1Million blocks and since you have 6GB allocated for DN heap, please verify the block counts on the DNs and ensure they are not too high (> 6Millions) in count which may explain why there are so many pause alerts. 

 

In case the block count is too high than expected, it means you need to increase the heap size to accomodate the block objects in JVM heap memeory.  

 

On a. side note, growing block counts also an early warning/indication of small files problem in cluster. You need to be vigilant about that. Verify the average block size and that would help you to understand, if you are having small files problem in your cluster. 

 

Regards,

Pabitra Das

 

 

 

avatar
Rising Star

Hello @PabitraDas,

Appreciate your assistance, below is the block count on our DNs, as mentioned earlier we have allocated 6 GB JVM Heap for DN's and 10 GB Heap for NN & SNN.

Do you suggest to increase DN Heap, or NN / SNN Heap as suggested by Shelton.

Block Count:

Node 1 = 7421379 
Node 2 = 5569699
Node 3 = 6003009
Node 4 = 7444205
Node 5 = 8770674
Node 6 = 8849641
Node 7 = 8232779
Node 8 = 8354714
Node 9 = 8860602

Also, would greatly appreciate if you have any pointers / suggestions (scripts etc. ) to identify small file issue and possible remediation.

 

Thanks 

Amn

avatar
Expert Contributor

Hello @Amn_468 

Since you reported the DN Pause time, I spoke/referred about DN heap only.  The block counts on most of the DN seems >6Millions, hence would suggest to increase the DN heap to 8GB (from current value of 6GB) and perorm a rolling restart to bring the new heap size into effect. 

 

There is no straight forward way to say you hit the small file problem but if your average block size is few MB or less than a MB in size, it is an indication that you are storing/accumulating small files in HDFS.  Simplest way to determine small files in cluster is to run fsck.

 

Fsck should show the average block size. If it's too low a value (eg ~ 1MB ), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks.

 

[..]

$ hdfs fsck /

..

...

Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<<

[..]

 

You may refer belwo links for your help on dealing with small files.

https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-c...

https://community.cloudera.com/t5/Community-Articles/Identify-where-most-of-the-small-file-are-locat...