Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to estimate the Optimal Java Heap Size for Navigator Metadata Server when there is no nav_elements nor nav_relations?

avatar
Explorer

Hello, 

 

I have a Production cluster and I get out of memory errors for Navigator Metadata Server. This is generating large dump files (33GB last time) in /tmp. Since /tmp has only 50GB it get's full quite fast. 


Errors: 

 

 

 

 

The health test result for NAVIGATORMETASERVER_UNEXPECTED_EXITS has become bad: This role encountered 1 unexpected exit(s) in the previous 5 minute(s).This included 1 exit(s) due to OutOfMemory errors. Critical threshold: any.



The health test result for NAVIGATORMETASERVER_DATA_DIRECTORY_FREE_SPACE has become unknown: Not enough data to test: Test of whether the Navigator Metadata Server Storage Dir has enough free space.

 

 

 

 

 

There is a formula to estimate the optimal Java Heap Size for Navigator Metadata Server but the problem in my case is that in the cloudera-scm-navigator log file there is no nav_elements nor nav_relations. Is there any way to estimate the optimal java heap size? Current value is set to 24GB.

Formula: ((num_nav_elements + num_nav_relations) * 200 bytes) + 2 GB 
Link from Cloudera Docs

Thank you,

 

9 REPLIES 9

avatar
Moderator

Hello @md186036 ,

 

thank you for your questions on

  1. How to disable heap dump, so your tmp folder is not filling up on OOM
  2. How to calculate the optimal heap size for Navigator Metadata Server if you cannot locate the nav_elements and nav_locations pattern in cloudera-scm-navigator log file

 

  1. In CM -> Cloudera Management Service -> Configuration -> Category: Advanced -> Uncheck "Dump Heap When Out of Memory"
  2. To assist you more efficiently, can you please specify the CDH version you are using, please?

 

Thank you:

Ferenc


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

avatar
Explorer

Hello @Bender ,

Thanks a lot for your fast answer. 

The CDH version I am using is 5.15.2. 

I checked the Development cluster and there I can find the number of elements in the log file but in the Production cluster, it's not possible. 

Thank you, 
Daniel 

avatar
Moderator

Hello @md186036 ,

 

do you see INFO level messages in the prod cluster log? I am suspecting that your log level is set to e.g. WARN or ERROR level/threshold and maybe this is the reason you do not see the "nav_" elements. It can be set via CM's service configuration too

 

Thank you:

Ferenc


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

avatar
Explorer

HI @Bender 
Navigator Metadata Server Logging Threshold is set to INFO

 

INFO.png

 



Thanks,

Daniel 

avatar
Moderator

Hello @md186036 ,

 

I have checked the Navigator logs in a test cluster for CDH5.16 (for our purposes this minor version difference should not matter much) and found the entries mentioned in this doc for CDH5.15:

2020-07-29 17:41:18,151 INFO com.cloudera.nav.server.NavServerUtil [main]: Found 885 documents in solr core nav_elements
2020-07-29 17:41:18,155 INFO com.cloudera.nav.server.NavServerUtil [main]: Found 916 documents in solr core nav_relations

 

I was navigating in CM -> Cloudera Management Service -> Navigator Metadata Server and then clicked on Log Files -> Role log file.

I have downloaded the file by clicking on the "Download Full Log".

I have observed that I have log entries saying INFO messages.

 

Do you see log entries with INFO in the log too, please?

I would like to rule out that the log level was not applied since your config change (e.g. was not restarted) or the log level was changed in other means (e.g. without restart).

 

Thank you:
Ferenc


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

avatar
Explorer

Hi @Bender ,

Thanks a lot for your research. I will check it and get back to you later today. 

Best regards,
Daniel

avatar
Explorer

Hi @Bender ,

I do have entries with INFO in the log file but I can't find any with nav_elements. I also have some errors and warnings: 

ERROR	SparkPushExtractor	
[qtp1810923540-17908]: com.cloudera.nav.pushextractor.spark.SparkPushExtractor Error extracting Spark operation.
java.lang.NullPointerException

WARN	ApiExceptionMapper	
[qtp1810923540-17908]: Unexpected exception.
java.lang.NullPointerException

ERROR	SparkPushExtractor	
[qtp1810923540-17879]: com.cloudera.nav.pushextractor.spark.SparkPushExtractor Error extracting Spark operation.
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected mime type application/octet-stream but got application/xml. <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="error"><str name="msg">application/x-www-form-urlencoded content length (3820497 bytes) exceeds upload limit of 2048 KB</str><int name="code">400</int></lst>
</response>

 

I restarted Cloudera Management Server on July 29th. 

Thanks, 
Daniel

avatar
Moderator

Hello @md186036 ,

 

The error message you pointed out [1] seems to be a known issue and is looked at by the below internal JIRA ticket:
NAV-7272 - NPE in getEpIdentitiesForMissingRelations

As per the JIRA ticket:
"An NPE is being caused by getEpIdentitiesForMissingRelations() during Spark extraction. The condition that causes it is rare, however, once the condition exists, because of the NPE, it will continue forever.
The code is trying to detect ep2Ids for linked relations that are missing so they can be added. However, the code fails to check for null in the case that this is true."

 

The fix is not available yet in any currently released CDH distribution. The fix might be available in CDH6.4.0, 5.16.3, 6.2.2, 6.3.4, 7.1.1, 5.17.0.

 

My understanding is that this can cause no new metadata is produced. Should you have a Cloudera Support Subscription, please kindly file a support ticket with us to assist you further, as there is no workaround identified for this bug.

 

Thank you:
Ferenc

 

[1] 

ERROR	SparkPushExtractor	
[qtp1810923540-17908]: com.cloudera.nav.pushextractor.spark.SparkPushExtractor Error extracting Spark operation.
java.lang.NullPointerExceptio

 


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

avatar
Explorer

Hi @Bender ,

 

Thanks a lot for your answers.
I will open a case in Cloudera Support for it. 

Kind regards,

Daniel