Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CM 6.1 hosts status Unknown Health randomly and periodically

Solved Go to solution

CM 6.1 hosts status Unknown Health randomly and periodically

Expert Contributor

Hi,

After I have upgraded to CM/CDH 6.1 from 5.16.1, my hosts randomly and periodically having "Unknown Health" for about a few seconds and then go back to green. I have not seen/found any WARNING nor any ERROR in any logs from hosts or any services.

The entire cluster works without any issue, I have run host inspection and network inspection without any problem. Also, synced time/date a few times just in case but still I can watch my hosts (also services because of the hosts) going grey with "Unknown Health" and back to green randomly for few seconds.

 

Cloudera Management Service is on one server with 14 cores and 28G memory. I have checked this server activity, it is pretty idle, so the cluster is not a busy cluster. Either way, this is the heap size for the monitorings:

Java Heap Size of Activity Monitor in Bytes: 2GB

Java Heap Size of Alert Publisher in Bytes: 256MB
Java Heap Size of EventServer in Bytes: 1GB
Java Heap Size of Host Monitor in Bytes: 4GB
Java Heap Size of Service Monitor in Bytes: 4GB
Maximum Non-Java Memory of Host Monitor: 8GB
Maximum Non-Java Memory of Service Monitor: 12GB
 
 

Do you guys have any advice on how to diagnose the possible issue?

 

 

Many thanks.

Screenshot 2019-01-17 16.33.37.pngScreenshot 2019-01-17 16.23.19.pngScreenshot 2019-01-17 16.22.52.png

1 ACCEPTED SOLUTION

Accepted Solutions

Re: CM 6.1 hosts status Unknown Health randomly and periodically

Expert Contributor
Sorry I forgot about this issue. I solved this by reading the error logs and fixing my NTP communication. The problem was not related to 6.1 or upgrading from 5.16.1 to 6.x so I am going to say it was as usual related to Time/Date/NTP.
2 REPLIES 2

Re: CM 6.1 hosts status Unknown Health randomly and periodically

Explorer

How did you do the upgrade? For me it's impossible to initialize any service. I'm trying the same as you: upgrading from 5.16.1 to version 6.1. This is my error:

0', u'expected_exitcodes': [], u'run_generation': 2, u'start_timeout_seconds': None, u'optional_tags': [u'hdfs-client-plugin', u'sentry-plugin'], u'parcels': {u'SPARK2': u'2.2.0.cloudera2-1.cdh5.12.0.p0.232957', u'CDH': u'5.16.1-1.cdh5.16.1.p0.3', u'KAFKA': u'3.1.0-1.3.1.0.p0.35'}}, {u'refresh_files': [], u'config_generation': 0, u'auto_restart': False, u'one_off': True, u'special_file_info': [], u'id': 8272, u'status_links': {}, u'extra_groups': [], u'environment': {u'HOST_STATISTICS_DIR': u'clouderapre-mgr.fintonic.com-caa7e6d9-d5ae-4ece-bb89-bc13333c9aa9-10.0.199.102-host-statistics', u'CM_HOST_NAME': u'clouderapre-mgr.fintonic.com', u'SET_PYTHON_PATH': u'true', u'TIMEOUT': u'60', u'REDACTION_RULES_FILE': u'redaction-rules.json', u'JAVA_HOME': u'/usr/java/jdk1.8.0_151'}, u'program': u'support/collect_host_stats.sh', u'arguments': [], u'resources': [], u'running': False, u'required_tags': [], u'user': u'root', u'group': u'root', u'name': u'collect-host-statistics', u'configuration_data': 'PK\x03\x04\x14\x00\x08\x08\x08\x00\x80"UN\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x14\x00\x00\x00redaction-rules.json\xad\x94AO\xc3 \x18\x86\xef\xfb\x15\x84\xa3\xab\x07\xafMv\x98v7\xa3f\xd3xh\xbb\x84\xb4\xdf&\xae\xb6\x0b\xe0\x8cY\xf6\xdf\x85\xe2`Y\xa8a`\x0f\xe4\xfbxy\xe9\xfb\x84\xc0~\x84\x10\xde\x01\xe3\xb4kq\x8an\x12\xd5\xb3\xcf\x06\xb8\xecr\xd9 \xb4\xefG9]\x03\xaf\x18\xdd\n\xbd\x14\xcf\xa1&\x95@[\xc2\xf9W\xc7j\x8eV\xac\xfb@\xef\xbck\xd1\x8a\xaa\r\x92\xa3\xb1"\x1c\x16\xd0r*\xe8\x0e\xa4uE\x1a\x0eF\x15\x8c\xae\xd7\xc0\xd4\x96\xc7\xbd\xac\x95\x03a\xd5\x9b\xd2\n\xa3\x168G\xe5U\xaa\x06Y.\x0b\\\x8e\x0bl-\x0c\xb6\r\xa9\xe0\xdc\x93\xa2\x02\xdf\xbe<d\xf7\xb3\xeb\xf9,\x9b\xde=\xcf2\xe9\xeaM\x87\xe4\x12\xd0\t"mm\xba\xf4\x9f1\x8fZ\x9eN\xca|)3\x17\xf2+\xc7N<\x93\xe8\x8c\xebb\xaaS\xa6p\xa2a\x1e\x7f\x9ah\x16K\x12\xca1D\xe1\xcb\x10C\xf04],^\x1f\xe7Y\x12\x90\xdd\xe1\xb5\x00FT\xf7\xe5\x0f\x00\xb3\x0eEPl\xe0[\x1f\x83,BNA\xda\\\x0cr\xda\xe7\x0c\xd4\xdf#\xc2s\xa8\x18\x08\x9d_\xd7!\x08\xda\xe9\xa2\xd0\x8a\x0f\xc8o\x92\x08\x16\xe9\xaf\xa1\x15\x944\x9a\xc7\xf6!L\xd6\xed\xe2\xb2\xaa\x0f\xdbI\xb2\x08>\xd1m\xa0\xd5h}\x19B\xd5\x1b]@\xbd\xe0\xc3\xa2S\xc4\xdd\x17Z\x9b\x1bC\x83^\xe0\xde8pk\xa8\xd7\xfb\xabS\xb81\xe4X\x8e\x0e\xa3\x1fPK\x07\x08\xb9\x86\x81Pp\x01\x00\x00.\x08\x00\x00PK\x01\x02\x14\x00\x14\x00\x08\x08\x08\x00\x80"UN\xb9\x86\x81Pp\x01\x00\x00.\x08\x00\x00\x14\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00redaction-rules.jsonPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00B\x00\x00\x00\xb2\x01\x00\x00\x00\x00', u'expected_exitcodes': [], u'run_generation': 2, u'start_timeout_seconds': None, u'optional_tags': [], u'parcels': {}}], u'server_manages_parcels': True, u'heartbeat_interval': 15, u'parcels_directory': u'/opt/cloudera/parcels', u'host_id': u'caa7e6d9-d5ae-4ece-bb89-bc13333c9aa9', u'cm_guid': u'1e8e8289-ab7a-4a18-b08e-06b9afac0d73', u'eventserver_host': u'clouderapre-node1.fintonic.com', u'enabled_metric_reporters': [u'SPARK_ON_YARN-SPARK_YARN_HISTORY_SERVER', u'SPARK_YARN_HISTORY_SERVER', u'HBASE-HBASERESTSERVER', u'HBASERESTSERVER', u'SPARK', u'SPARK', u'ACCUMULO_C6-ACCUMULO_GC', u'ACCUMULO_GC', u'HBASE', u'HBASE', u'MGMT-EVENTSERVER', u'EVENTSERVER', u'KAFKA-KAFKA_BROKER', u'KAFKA_BROKER', u'HBASE-REGIONSERVER', u'REGIONSERVER', u'LUNA_KMS-HSMKP_LUNA', u'HSMKP_LUNA', u'THALES_KMS', u'THALES_KMS', u'IMPALA-IMPALAD', u'IMPALAD', u'AUTH-AUTHSERVER', u'AUTHSERVER', u'ISILON', u'ISILON', u'YARN-NODEMANAGER', u'NODEMANAGER', u'MAPREDUCE', u'MAPREDUCE', u'ACCUMULO16-ACCUMULO16_TRACER', u'ACCUMULO16_TRACER', u'KMS', u'KMS', u'ACCUMULO16-ACCUMULO16_MONITOR', u'ACCUMULO16_MONITOR', u'YARN-JOBHISTORY', u'JOBHISTORY', u'KEYTRUSTEE', u'KEYTRUSTEE', u'HDFS-JOURNALNODE', u'JOURNALNODE', u'KAFKA', u'KAFKA', u'SPARK-SPARK_HISTORY_SERVER', u'SPARK_HISTORY_SERVER', u'HDFS-NAMENODE', u'NAMENODE', u'MAPREDUCE-TASKTRACKER', u'TASKTRACKER', u'IMPALA-CATALOGSERVER', u'CATALOGSERVER', u'SPARK2_ON_YARN', u'SPARK2_ON_YARN', u'KUDU-KUDU_MASTER', u'KUDU_MASTER', u'LUNA_KMS', u'LUNA_KMS', u'HDFS-DSSDDATANODE', u'DSSDDATANODE', u'SENTRY', u'SENTRY', u'ACCUMULO16-ACCUMULO16_GC', u'ACCUMULO16_GC', u'MGMT-NAVIGATOR', u'NAVIGATOR', u'MGMT-TELEMETRYPUBLISHER', u'TELEMETRYPUBLISHER', u'HIVE', u'HIVE', u'ACCUMULO_C6-ACCUMULO_MASTER', u'ACCUMULO_MASTER', u'SQOOP-SQOOP_SERVER', u'SQOOP_SERVER', u'KAFKA-KAFKA_MIRROR_MAKER', u'KAFKA_MIRROR_MAKER', u'KUDU', u'KUDU', u'ACCUMULO16-ACCUMULO16_MASTER', u'ACCUMULO16_MASTER', u'OOZIE', u'OOZIE', u'SQOOP_CLIENT', u'SQOOP_CLIENT', u'OOZIE-OOZIE_SERVER', u'OOZIE_SERVER', u'HDFS-FAILOVERCONTROLLER', u'FAILOVERCONTROLLER', u'YARN', u'YARN', u'HDFS-NFSGATEWAY', u'NFSGATEWAY', u'HDFS-HTTPFS', u'HTTPFS', u'HUE-KT_RENEWER', u'KT_RENEWER', u'KEYTRUSTEE_SERVER-DB_ACTIVE', u'DB_ACTIVE', u'KS_INDEXER-HBASE_INDEXER', u'HBASE_INDEXER', u'ACCUMULO_C6-ACCUMULO_TSERVER', u'ACCUMULO_TSERVER', u'ACCUMULO16', u'ACCUMULO16', u'KEYTRUSTEE-KMS_KEYTRUSTEE', u'KMS_KEYTRUSTEE', u'SOLR-SOLR_SERVER', u'SOLR_SERVER', u'HOST', u'KEYTRUSTEE_SERVER-KEYTRUSTEE_PASSIVE_SERVER', u'KEYTRUSTEE_PASSIVE_SERVER', u'IMPALA-STATESTORE', u'STATESTORE', u'HDFS-DATANODE', u'DATANODE', u'YARN-RESOURCEMANAGER', u'RESOURCEMANAGER', u'HUE-HUE_SERVER', u'HUE_SERVER', u'MGMT-NAVIGATORMETASERVER', u'NAVIGATORMETASERVER', u'HBASE-MASTER', u'MASTER', u'KEYTRUSTEE_SERVER-DB_PASSIVE', u'DB_PASSIVE', u'SPARK_ON_YARN', u'SPARK_ON_YARN', u'SPARK2_ON_YARN-SPARK2_YARN_HISTORY_SERVER', u'SPARK2_YARN_HISTORY_SERVER', u'MGMT-REPORTSMANAGER', u'REPORTSMANAGER', u'MGMT-SERVICEMONITOR', u'SERVICEMONITOR', u'MGMT-ALERTPUBLISHER', u'ALERTPUBLISHER', u'HIVE-HIVESERVER2', u'HIVESERVER2', u'MGMT-ACTIVITYMONITOR', u'ACTIVITYMONITOR', u'AUTH-AUTH_LOAD_BALANCER', u'AUTH_LOAD_BALANCER', u'MAPREDUCE-FAILOVERCONTROLLER', u'FAILOVERCONTROLLER', u'ZOOKEEPER', u'ZOOKEEPER', u'MGMT-HOSTMONITOR', u'HOSTMONITOR', u'AUTH', u'AUTH', u'IMPALA', u'IMPALA', u'KEYTRUSTEE_SERVER-KEYTRUSTEE_ACTIVE_SERVER', u'KEYTRUSTEE_ACTIVE_SERVER', u'SOLR', u'SOLR', u'ACCUMULO_C6', u'ACCUMULO_C6', u'ACCUMULO_C6-ACCUMULO_TRACER', u'ACCUMULO_TRACER', u'ACCUMULO16-ACCUMULO16_TSERVER', u'ACCUMULO16_TSERVER', u'LUNA_KMS-HSMKP_METASTORE_LUNA', u'HSMKP_METASTORE_LUNA', u'HBASE-HBASETHRIFTSERVER', u'HBASETHRIFTSERVER', u'ACCUMULO_C6-ACCUMULO_MONITOR', u'ACCUMULO_MONITOR', u'FLUME', u'FLUME', u'HUE', u'HUE', u'HDFS-SECONDARYNAMENODE', u'SECONDARYNAMENODE', u'SENTRY-SENTRY_SERVER', u'SENTRY_SERVER', u'THALES_KMS-HSMKP_METASTORE_THALES', u'HSMKP_METASTORE_THALES', u'HIVE-HIVEMETASTORE', u'HIVEMETASTORE', u'IMPALA-LLAMA', u'LLAMA', u'SPARK-SPARK_WORKER', u'SPARK_WORKER', u'MGMT', u'MGMT', u'HIVE-WEBHCAT', u'WEBHCAT', u'SQOOP', u'SQOOP', u'HUE-HUE_LOAD_BALANCER', u'HUE_LOAD_BALANCER', u'FLUME-AGENT', u'AGENT', u'HDFS', u'HDFS', u'KUDU-KUDU_TSERVER', u'KUDU_TSERVER', u'KMS-KMS', u'KMS', u'KS_INDEXER', u'KS_INDEXER', u'SPARK-SPARK_MASTER', u'SPARK_MASTER', u'ZOOKEEPER-SERVER', u'SERVER', u'KEYTRUSTEE_SERVER', u'KEYTRUSTEE_SERVER', u'MAPREDUCE-JOBTRACKER', u'JOBTRACKER', u'THALES_KMS-HSMKP_THALES', u'HSMKP_THALES'], u'flood_seed_timeout': 100, u'eventserver_port': 7185}
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1528, in handle_heartbeat_response
    self._handle_heartbeat_response(response)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1661, in _handle_heartbeat_response
    self._update_parcel_activation_state(response)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1600, in _update_parcel_activation_state
    manage_new_parcels)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/parcel.py", line 640, in configure_all_symlinks
    self.ensure_active_symlink(prod[version], False)
KeyError: '5.15.1-1.cdh5.15.1.p0.4'

Re: CM 6.1 hosts status Unknown Health randomly and periodically

Expert Contributor
Sorry I forgot about this issue. I solved this by reading the error logs and fixing my NTP communication. The problem was not related to 6.1 or upgrading from 5.16.1 to 6.x so I am going to say it was as usual related to Time/Date/NTP.