Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NAVIGATORMETASERVER_SCM_HEALTH has become bad

avatar
Champion

 

We have CDH 5.7.0 environment for the past 18 months and everything was ok till 2 days before. But started throughing the below alert for the past 2 days (keep getting)

 

Alert message from Cloudera Manager

The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Time: Sep 1, 2016 11:52:29 AM
View Details on <hostname>
Monitor Startup: false
Role: Navigator Metadata Server (<hostname>)
Role Type: Navigator Metadata Server
Cluster: <Clustername>
Cluster Display Name: <Clustername>
Service: mgmt
Service Display Name: mgmt
Service Type: Cloudera Management Service
Hosts: <Hostname>
Health Test Results: The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started

 

 

a. Is there any way to fix this issue?

b. If not,  This issue send an alert message to our distribution group for every 5 minutes (end-up with 100s of mail by eod everyday). Is it OK to disable (only) this alert message to avoid the mail junk? if so, how to disable (only) this alert?

c. I am looking for a quick fix now because upgrade to latest version needs lot of approval and long term fix.. but Unfortunately this is not the time for that. 

 

Appreciate your help

 

 

 

1. cd /var/log/cloudera-scm-firehose
sudo tail -100 mgmt-cmf-mgmt-SERVICEMONITOR-<hostname>.com.log.out

2016-09-01 13:16:31,995 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hostname>:9083
2016-09-01 13:16:31,996 INFO hive.metastore: Opened a connection to metastore, current connections: 1
2016-09-01 13:16:31,996 INFO hive.metastore: Connected to metastore.
2016-09-01 13:16:32,665 INFO hive.metastore: Closed a connection to metastore, current connections: 0
2016-09-01 13:19:24,750 ERROR com.cloudera.cmf.BasicScmProxy: Failed request to SCM: 302
2016-09-01 13:19:25,749 INFO com.cloudera.cmf.BasicScmProxy: Authentication to SCM required.
2016-09-01 13:19:25,815 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM
2016-09-01 13:19:25,818 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
2016-09-01 13:20:09,250 INFO com.cloudera.cmon.tstore.leveldb.LDBPartitionManager: Expiring partition LDBPartitionMetadataWrapper{tableName=type, partitionName=type_2016- 08-31T05:11:09.207Z, startTime=2016-08-31T05:11:09.207Z, endTime=2016-08-31T05:41:09.207Z, version=2, state=CLOSED}


2. cd /var/log/cloudera-scm-firehose
sudo tail -500 mgmt-cmf-mgmt-HOSTMONITOR-eqa-<hostname>.log.out

2016-09-01 13:15:10,693 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-09-01T17:15:10.693Z, forMigratedData=false
2016-09-01 13:17:24,683 ERROR com.cloudera.cmf.BasicScmProxy: Failed request to SCM: 302
2016-09-01 13:17:25,683 INFO com.cloudera.cmf.BasicScmProxy: Authentication to SCM required.
2016-09-01 13:17:25,749 INFO com.cloudera.cmf.BasicScmProxy: Using encrypted credentials for SCM
2016-09-01 13:17:25,752 INFO com.cloudera.cmf.BasicScmProxy: Authenticated to SCM.
2016-09-01 13:20:10,693 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Running the LDBTimeSeriesRollupManager at 2016-09-01T17:20:10.693Z, forMigratedData=false
2016-09-01 13:20:10,693 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Starting rollup from raw to rollup=TEN_MINUTELY for rollupTimestamp=2016-09-01T17:20:00.000Z
2016-09-01 13:20:11,654 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT0.961S, numStreamsChecked=41773, numStreamsRolledUp=4448

 

 

3. I've copy pasted the log from the following path : Cloudera manager -> mgmt -> quick links -> Alert

 

September 2, 2016 6:04 AM

Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 5:51 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 5:37 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 5:26 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 5:11 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 4:59 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 4:46 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 4:33 AM

Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 4:20 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

 

September 2, 2016 4:04 AM
Alert View CRITICAL
The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
Service: mgmt Role: Navigator Metadata Server (<hostname>) Hosts: <hostname>
Category: HEALTH_CHECK Expand

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi,

 

The behavior you describe usually indicates that Navigator is exiting and then Cloudera Manager is starting it again.

Very commonly, the exit is due to a java.lang.OutOfMemoryError exception.  You can verify this by (in Cloudera Manager) :

 

- going to Cloudera Management Service --> Instances --> Navigator Metadata Server --> Processes

- and then clicking on "stdout" in the bottom middle of the page.

 

If you see lots of the OOMEs, then Navigator likely needs more heap to perform its operations.

If you have enough RAM, you can double the size or start with maybe just 2GB more.

 

To edit the configuration, go to:

 

Cloudera Management Service --> Configuration

Then, search for "Java Heap Size of Navigator Metadata Server in Bytes"

 

If you don't see the heap issues in stdout, please let us know.

 

Regards,

 

Ben

View solution in original post

9 REPLIES 9

avatar
Champion

The expaned log as follows 

 

ALERT_SUMMARY
The health of role Navigator Metadata Server (<hostname>) has become bad.
BAD_TEST_RESULTS
1
CATEGORY
HEALTH_CHECK
CLUSTER
<Clustername>
CLUSTER_DISPLAY_NAME
<Clustername>
CLUSTER_ID
1
CURRENT_COMPLETE_HEALTH_TEST_RESULTS
{"content":"The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.","testName":"NAVIGATORMETASERVER_SCM_HEALTH","eventCode":"EV_ROLE_HEALTH_CHECK_BAD","severity":"CRITICAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_UNEXPECTED_EXITS has become good: This role encountered 0 unexpected exit(s) in the previous 5 minute(s).","testName":"NAVIGATORMETASERVER_UNEXPECTED_EXITS","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_FILE_DESCRIPTOR has become good: Open file descriptors: 462. File descriptor limit: 32,768. Percentage in use: 1.41%.","testName":"NAVIGATORMETASERVER_FILE_DESCRIPTOR","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_SWAP_MEMORY_USAGE has become good: 0 B of swap memory is being used by this role's process.","testName":"NAVIGATORMETASERVER_SWAP_MEMORY_USAGE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_LOG_DIRECTORY_FREE_SPACE has become good: This role's Log Directory (/var/log/cloudera-scm-navigator) is on a filesystem with more than 10.0 GiB of its space free.","testName":"NAVIGATORMETASERVER_LOG_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_HOST_HEALTH has become good: The health of this role's host is good.","testName":"NAVIGATORMETASERVER_HOST_HEALTH","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_AUDIT_EVENT_LOG_DIRECTORY_FREE_SPACE has become good: This role's Audit Log Directory (/var/log/cloudera-scm-navigator/audit) is on a filesystem with more than 10.0 GiB of its space free.","testName":"NAVIGATORMETASERVER_AUDIT_EVENT_LOG_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_DATA_DIRECTORY_FREE_SPACE has become good: This role's Navigator Metadata Server Storage Dir (/var/lib/cloudera-scm-navigator) is on a filesystem with more than 10.0 GiB of its space free.","testName":"NAVIGATORMETASERVER_DATA_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_HEAP_DUMP_DIRECTORY_FREE_SPACE has become disabled: Test disabled because role is not configured to dump heap when out of memory. Test of whether this role's heap dump directory has enough free space.","testName":"NAVIGATORMETASERVER_HEAP_DUMP_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_DISABLED","severity":"INFORMATIONAL","suppressed":false}
CURRENT_HEALTH_SUMMARY
RED
EVENTCODE
EV_ROLE_HEALTH_CHECK_BAD,EV_ROLE_HEALTH_CHECK_GOOD,EV_ROLE_HEALTH_CHECK_DISABLED
HEALTH_TEST_NAME
NAVIGATORMETASERVER_SCM_HEALTH
HEALTH_TEST_RESULTS
NAVIGATORMETASERVER_SCM_HEALTH
Event Code: EV_ROLE_HEALTH_CHECK_BAD
HOSTS
<hostname>
HOST_IDS
4d290e2a-5788-44fb-82c0-0708f95db286
MONITOR_STARTUP
false
PREVIOUS_COMPLETE_HEALTH_TEST_RESULTS
{"content":"The health test result for NAVIGATORMETASERVER_SCM_HEALTH has become good: This role's status is as expected. The role is started.","testName":"NAVIGATORMETASERVER_SCM_HEALTH","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_UNEXPECTED_EXITS has become good: This role encountered 0 unexpected exit(s) in the previous 5 minute(s).","testName":"NAVIGATORMETASERVER_UNEXPECTED_EXITS","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_FILE_DESCRIPTOR has become good: Open file descriptors: 462. File descriptor limit: 32,768. Percentage in use: 1.41%.","testName":"NAVIGATORMETASERVER_FILE_DESCRIPTOR","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_SWAP_MEMORY_USAGE has become good: 0 B of swap memory is being used by this role's process.","testName":"NAVIGATORMETASERVER_SWAP_MEMORY_USAGE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_LOG_DIRECTORY_FREE_SPACE has become good: This role's Log Directory (/var/log/cloudera-scm-navigator) is on a filesystem with more than 10.0 GiB of its space free.","testName":"NAVIGATORMETASERVER_LOG_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_HOST_HEALTH has become good: The health of this role's host is good.","testName":"NAVIGATORMETASERVER_HOST_HEALTH","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_AUDIT_EVENT_LOG_DIRECTORY_FREE_SPACE has become good: This role's Audit Log Directory (/var/log/cloudera-scm-navigator/audit) is on a filesystem with more than 10.0 GiB of its space free.","testName":"NAVIGATORMETASERVER_AUDIT_EVENT_LOG_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_DATA_DIRECTORY_FREE_SPACE has become good: This role's Navigator Metadata Server Storage Dir (/var/lib/cloudera-scm-navigator) is on a filesystem with more than 10.0 GiB of its space free.","testName":"NAVIGATORMETASERVER_DATA_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_GOOD","severity":"INFORMATIONAL","suppressed":false},{"content":"The health test result for NAVIGATORMETASERVER_HEAP_DUMP_DIRECTORY_FREE_SPACE has become disabled: Test disabled because role is not configured to dump heap when out of memory. Test of whether this role's heap dump directory has enough free space.","testName":"NAVIGATORMETASERVER_HEAP_DUMP_DIRECTORY_FREE_SPACE","eventCode":"EV_ROLE_HEALTH_CHECK_DISABLED","severity":"INFORMATIONAL","suppressed":false}
PREVIOUS_HEALTH_SUMMARY
GREEN
ROLE
Navigator Metadata Server (<hostname>)
ROLE_TYPE
NAVIGATORMETASERVER
SERVICE
mgmt
SERVICE_TYPE
MGMT
SEVERITY
CRITICAL
URL
https://<hostname>:7183/cmf/eventRedirect/

avatar
Champion

Also there is a change in alert frequency... the initial alert frequency was 10 to 15 mins but now it get reduced upto 2 mins.

 

Note: In the past one week our cluster usage has been doubled. 

 

Health History
4:34:04 PM Navigator Metadata Server Health Good Show
4:33:34 PM Navigator Metadata Server Health Bad Show
4:30:33 PM Navigator Metadata Server Health Good Show
4:30:03 PM Navigator Metadata Server Health Bad Show
4:25:47 PM Navigator Metadata Server Health Good Show
4:25:32 PM Navigator Metadata Server Health Bad Show
4:21:31 PM Navigator Metadata Server Health Good Show

 

 

 

 

avatar
Master Guru

Hi,

 

The behavior you describe usually indicates that Navigator is exiting and then Cloudera Manager is starting it again.

Very commonly, the exit is due to a java.lang.OutOfMemoryError exception.  You can verify this by (in Cloudera Manager) :

 

- going to Cloudera Management Service --> Instances --> Navigator Metadata Server --> Processes

- and then clicking on "stdout" in the bottom middle of the page.

 

If you see lots of the OOMEs, then Navigator likely needs more heap to perform its operations.

If you have enough RAM, you can double the size or start with maybe just 2GB more.

 

To edit the configuration, go to:

 

Cloudera Management Service --> Configuration

Then, search for "Java Heap Size of Navigator Metadata Server in Bytes"

 

If you don't see the heap issues in stdout, please let us know.

 

Regards,

 

Ben

avatar
Champion

Thanks a lot Ben for your quick respond.

 

Yes, we are getting lots of OOME (Out Of Memory Error).  In fact, we have a very good memory (126 GB) but currently  "Java Heap Size of Navigator Metadata Server in Bytes" is setup with 1 GB. We will try to increase this to 2 GB or more. In the meantime, we need below clarifications...

 

a. Is this configuration an independent one? Can we go ahead and increase this parameter just like that? Becase some configuration has dependency on others. Ex: Max Memory has dependency on Min Memory ( Max Memory >= Min Memory )

b. How to make this configuration change effective? any stale configuration (or) re-start required after the configuration change? (or) Will Cloudera Management Service will automatically pick-up the new change as soon as we update it?

c. Any other points to note before we change this configuration? like hold the currently running jobs, etc

 

# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh"
# Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"...
Sat Sep 3 11:29:56 EDT 2016
JAVA_HOME=/usr/java/default
Executing: /usr/java/default/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt-NAVIGATORMETASERVER-<hostname>.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Xms1073741824 -Xmx1073741824 -XX:MaxPermSize=192m -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/cloudera-navigator-server/nav-server-2.6.0.jar:/usr/share/cmf/cloudera-navigator-server/jars/*:/usr/share/cmf/lib/plugins/tt-instrumentation-5.7.0.jar:/usr/share/cmf/lib/plugins/event-publish-5.7.0-shaded.jar -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/log4j.properties -Dnavigator.auditModels.dir=/usr/share/cmf/cloudera-navigator-audit-server/auditModels com.cloudera.nav.server.NavServer /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator.properties /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator-cm-auth.properties /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/db.navms.properties



# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh"
# Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"...
Sat Sep 3 11:36:26 EDT 2016
JAVA_HOME=/usr/java/default
Executing: /usr/java/default/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt-NAVIGATORMETASERVER-<hostname>.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Xms1073741824 -Xmx1073741824 -XX:MaxPermSize=192m -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/cloudera-navigator-server/nav-server-2.6.0.jar:/usr/share/cmf/cloudera-navigator-server/jars/*:/usr/share/cmf/lib/plugins/tt-instrumentation-5.7.0.jar:/usr/share/cmf/lib/plugins/event-publish-5.7.0-shaded.jar -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/log4j.properties -Dnavigator.auditModels.dir=/usr/share/cmf/cloudera-navigator-audit-server/auditModels com.cloudera.nav.server.NavServer /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator.properties /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator-cm-auth.properties /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/db.navms.properties

 


# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh"
# Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"...
Sat Sep 3 11:41:48 EDT 2016
JAVA_HOME=/usr/java/default
Executing: /usr/java/default/bin/java -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Dmgmt.log.file=mgmt-cmf-mgmt-NAVIGATORMETASERVER-<hostname>.log.out -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Xms1073741824 -Xmx1073741824 -XX:MaxPermSize=192m -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -cp /usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/cloudera-navigator-server/nav-server-2.6.0.jar:/usr/share/cmf/cloudera-navigator-server/jars/*:/usr/share/cmf/lib/plugins/tt-instrumentation-5.7.0.jar:/usr/share/cmf/lib/plugins/event-publish-5.7.0-shaded.jar -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/log4j.properties -Dnavigator.auditModels.dir=/usr/share/cmf/cloudera-navigator-audit-server/auditModels com.cloudera.nav.server.NavServer /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator.properties /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/cloudera-navigator-cm-auth.properties /var/run/cloudera-scm-agent/process/736-cloudera-mgmt-NAVIGATORMETASERVER/db.navms.properties

 

Thanks

Kumar

avatar
Master Guru

Hi Kumar,

 

Glad that helped.  To answer a few of your questions posed before you confirmed that the heap increase helped:

 

(a)  There are no dependencies; indeed, you can simply increase the heap value to increase the max heap (which actually sets min and max to the same value).  Change the heap size and restart Navigator to have it apply.

 

(b)  No other changes, updates, etc. are needed.  The heap size change is isolated to Navigator in this case.  A restart of Navigator is required, though to create a new JVM with the new max heap value

 

(c) Navigator extracts data from all your services based on your cluster configuration, so there is no need to change anything for this update.  When you restart Navigator, it will run extraction to ingest the latest information.

 

Cheers,

 

Ben

avatar
Champion

@bgooley

 

Increasing the "Java Heap Size of Navigator Metadata Server in Bytes" is fixing the "NAVIGATORMETASERVER_SCM_HEALTH has become bad" issue. But getting the same issue after a month.

 

Pls find below the log that we are maintaining internally about the Java Heap size increment.

 

09/06/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 1 GiB to 2 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health
10/18/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 2 GiB to 3 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health
12/01/16 - changed Java Heap Size of Navigator Metadata Server in Bytes from 3 GiB to 4 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health
01/17/17 - changed Java Heap Size of Navigator Metadata Server in Bytes from 4 GiB to 5 GiB due to NAVIGATORMETASERVER_SCM_HEALTH bad health

So my question is, 

1. What would be the maximum Java Heap Size? I know it is based on our configuration but Is there any chart to define/identify the max, so that I will make sure to not increase more than the recommendation. Because this is our prod and I don't want to break anything else just by Keep increasing Java Heap Size

 

avatar

Hi Team,

 

I too have the same question, we have 18 nodes cluster and we already had 8GB for this parameter. 

 

Java Heap Size of Navigator Metadata Server in Bytes   : 8GB 
 
Is there any memory limit for this parameter ? 
 
Thanks
Kishore
 

 

 

 

 

avatar
Rising Star

The amount of memory to assign to the JVM is relative to the number of documents in solr core nav_elements as per the documentation. See role log to get this number from your instance. The JVM sizing formula is number of nav_elements * 200, which gives you a rough estimate of what is required for normal operation. 

avatar
Champion

FYI

 

We have increased the "Java heap size of the Navigator metadata server in bytes" from 1 GB to 2 GB and restarted the cloudera management service.. 

 

The issue has been resolved

 

Thanks

Kumar