Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do I need to tune Java heap size

avatar
Contributor

for the namenode and for datanodes and for yarn/spark? Or is the default provided by Ambari suitable for production use?

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
@Kartik Vashishta

Tuning java heap size completely depends on your usecase. Are you seeing any performance related issues with your current heap configs ?

Here is the recommendation from hortonworks for namenode : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-809...

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Hi @Kartik Vashishta, I can answer for the HDFS services.

The NameNode heap size depends on the total number of file system objects that you have (files/blocks). The exact heap tuning recommendations are documented in the HDP manual install section (same link that @Sandeep Nemuri provided in another answer). I recommend checking that the Ambari configured values are in line with these recommendations since misconfigured heap settings affect NameNode performance significantly. Also the heap size requirements change with time as cluster usage grows.

The DataNode heap size requirement depends on the total number of blocks on each DataNode. The default 1GB heap is insufficient for larger capacity DataNodes. We now recommend using a heap size of 4GB for DataNodes as Benjamin suggested.

Ensuring you have GC logging enabled for your services is a good idea. There is an HCC article on NameNode heap tuning that goes into a lot more detail on related topics.

avatar
New Contributor

@Arpit is right that you should do an actual calculation for the namenode heap and keep that up to date as your data grows. I know this thread is about datanodes, but since namenode was brought up multiple times, I just want to point out that Cloudera recommends 1GB per million files+blocks as a good starting point. Once you get to many millions of files and blocks, you can reduce it but start there.