Support Questions

Find answers, ask questions, and share your expertise

Do I need to tune Java heap size

avatar
Contributor

for the namenode and for datanodes and for yarn/spark? Or is the default provided by Ambari suitable for production use?

1 ACCEPTED SOLUTION

avatar
Master Guru

Ambari applies some heuristics. But it can never hurt to double check.

Datanode: 1GB works but for normal nodes ( 16 cores/12-14 drives/128-256GB RAM) I normally set it to 4GB

Namenode: Depends on HDFS size. A good rule of thumb is that 100TB of data in HDFS need 1GB of RAM.

Spark: This totally depends on your spark needs. ( not talking about history server but the defaults for executors ) The more power you need the more executors and more RAM in them ( up to 32GB is good apparently )

Yarn: Ambari does decent heuristics but I like to tune them normally. In the end you should change the sizes until yoiur cluster has a good CPU utilization. But this is something more elaborate and depends on your usecases

View solution in original post

4 REPLIES 4

avatar
@Kartik Vashishta

Tuning java heap size completely depends on your usecase. Are you seeing any performance related issues with your current heap configs ?

Here is the recommendation from hortonworks for namenode : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-809...

avatar
Master Guru

Ambari applies some heuristics. But it can never hurt to double check.

Datanode: 1GB works but for normal nodes ( 16 cores/12-14 drives/128-256GB RAM) I normally set it to 4GB

Namenode: Depends on HDFS size. A good rule of thumb is that 100TB of data in HDFS need 1GB of RAM.

Spark: This totally depends on your spark needs. ( not talking about history server but the defaults for executors ) The more power you need the more executors and more RAM in them ( up to 32GB is good apparently )

Yarn: Ambari does decent heuristics but I like to tune them normally. In the end you should change the sizes until yoiur cluster has a good CPU utilization. But this is something more elaborate and depends on your usecases

avatar

Hi @Kartik Vashishta, I can answer for the HDFS services.

The NameNode heap size depends on the total number of file system objects that you have (files/blocks). The exact heap tuning recommendations are documented in the HDP manual install section (same link that @Sandeep Nemuri provided in another answer). I recommend checking that the Ambari configured values are in line with these recommendations since misconfigured heap settings affect NameNode performance significantly. Also the heap size requirements change with time as cluster usage grows.

The DataNode heap size requirement depends on the total number of blocks on each DataNode. The default 1GB heap is insufficient for larger capacity DataNodes. We now recommend using a heap size of 4GB for DataNodes as Benjamin suggested.

Ensuring you have GC logging enabled for your services is a good idea. There is an HCC article on NameNode heap tuning that goes into a lot more detail on related topics.

avatar
New Contributor

@Arpit is right that you should do an actual calculation for the namenode heap and keep that up to date as your data grows. I know this thread is about datanodes, but since namenode was brought up multiple times, I just want to point out that Cloudera recommends 1GB per million files+blocks as a good starting point. Once you get to many millions of files and blocks, you can reduce it but start there.