Reply
Expert Contributor
Posts: 64
Registered: ‎11-04-2016
Accepted Solution

CM 6.1 hosts status Unknown Health randomly and periodically

[ Edited ]

Hi,

After I have upgraded to CM/CDH 6.1 from 5.16.1, my hosts randomly and periodically having "Unknown Health" for about a few seconds and then go back to green. I have not seen/found any WARNING nor any ERROR in any logs from hosts or any services.

The entire cluster works without any issue, I have run host inspection and network inspection without any problem. Also, synced time/date a few times just in case but still I can watch my hosts (also services because of the hosts) going grey with "Unknown Health" and back to green randomly for few seconds.

 

Cloudera Management Service is on one server with 14 cores and 28G memory. I have checked this server activity, it is pretty idle, so the cluster is not a busy cluster. Either way, this is the heap size for the monitorings:

Java Heap Size of Activity Monitor in Bytes: 2GB

Java Heap Size of Alert Publisher in Bytes: 256MB
Java Heap Size of EventServer in Bytes: 1GB
Java Heap Size of Host Monitor in Bytes: 4GB
Java Heap Size of Service Monitor in Bytes: 4GB
Maximum Non-Java Memory of Host Monitor: 8GB
Maximum Non-Java Memory of Service Monitor: 12GB
 
 

Do you guys have any advice on how to diagnose the possible issue?

 

 

Many thanks.

Screenshot 2019-01-17 16.33.37.pngScreenshot 2019-01-17 16.23.19.pngScreenshot 2019-01-17 16.22.52.png

Highlighted
Explorer
Posts: 9
Registered: ‎06-19-2018

Re: CM 6.1 hosts status Unknown Health randomly and periodically

How did you do the upgrade? For me it's impossible to initialize any service. I'm trying the same as you: upgrading from 5.16.1 to version 6.1. This is my error:

0', u'expected_exitcodes': [], u'run_generation': 2, u'start_timeout_seconds': None, u'optional_tags': [u'hdfs-client-plugin', u'sentry-plugin'], u'parcels': {u'SPARK2': u'2.2.0.cloudera2-1.cdh5.12.0.p0.232957', u'CDH': u'5.16.1-1.cdh5.16.1.p0.3', u'KAFKA': u'3.1.0-1.3.1.0.p0.35'}}, {u'refresh_files': [], u'config_generation': 0, u'auto_restart': False, u'one_off': True, u'special_file_info': [], u'id': 8272, u'status_links': {}, u'extra_groups': [], u'environment': {u'HOST_STATISTICS_DIR': u'clouderapre-mgr.fintonic.com-caa7e6d9-d5ae-4ece-bb89-bc13333c9aa9-10.0.199.102-host-statistics', u'CM_HOST_NAME': u'clouderapre-mgr.fintonic.com', u'SET_PYTHON_PATH': u'true', u'TIMEOUT': u'60', u'REDACTION_RULES_FILE': u'redaction-rules.json', u'JAVA_HOME': u'/usr/java/jdk1.8.0_151'}, u'program': u'support/collect_host_stats.sh', u'arguments': [], u'resources': [], u'running': False, u'required_tags': [], u'user': u'root', u'group': u'root', u'name': u'collect-host-statistics', u'configuration_data': 'PK\x03\x04\x14\x00\x08\x08\x08\x00\x80"UN\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x14\x00\x00\x00redaction-rules.json\xad\x94AO\xc3 \x18\x86\xef\xfb\x15\x84\xa3\xab\x07\xafMv\x98v7\xa3f\xd3xh\xbb\x84\xb4\xdf&\xae\xb6\x0b\xe0\x8cY\xf6\xdf\x85\xe2`Y\xa8a`\x0f\xe4\xfbxy\xe9\xfb\x84\xc0~\x84\x10\xde\x01\xe3\xb4kq\x8an\x12\xd5\xb3\xcf\x06\xb8\xecr\xd9 \xb4\xefG9]\x03\xaf\x18\xdd\n\xbd\x14\xcf\xa1&\x95@[\xc2\xf9W\xc7j\x8eV\xac\xfb@\xef\xbck\xd1\x8a\xaa\r\x92\xa3\xb1"\x1c\x16\xd0r*\xe8\x0e\xa4uE\x1a\x0eF\x15\x8c\xae\xd7\xc0\xd4\x96\xc7\xbd\xac\x95\x03a\xd5\x9b\xd2\n\xa3\x168G\xe5U\xaa\x06Y.\x0b\\\x8e\x0bl-\x0c\xb6\r\xa9\xe0\xdc\x93\xa2\x02\xdf\xbe<d\xf7\xb3\xeb\xf9,\x9b\xde=\xcf2\xe9\xeaM\x87\xe4\x12\xd0\t"mm\xba\xf4\x9f1\x8fZ\x9eN\xca|)3\x17\xf2+\xc7N<\x93\xe8\x8c\xebb\xaaS\xa6p\xa2a\x1e\x7f\x9ah\x16K\x12\xca1D\xe1\xcb\x10C\xf04],^\x1f\xe7Y\x12\x90\xdd\xe1\xb5\x00FT\xf7\xe5\x0f\x00\xb3\x0eEPl\xe0[\x1f\x83,BNA\xda\\\x0cr\xda\xe7\x0c\xd4\xdf#\xc2s\xa8\x18\x08\x9d_\xd7!\x08\xda\xe9\xa2\xd0\x8a\x0f\xc8o\x92\x08\x16\xe9\xaf\xa1\x15\x944\x9a\xc7\xf6!L\xd6\xed\xe2\xb2\xaa\x0f\xdbI\xb2\x08>\xd1m\xa0\xd5h}\x19B\xd5\x1b]@\xbd\xe0\xc3\xa2S\xc4\xdd\x17Z\x9b\x1bC\x83^\xe0\xde8pk\xa8\xd7\xfb\xabS\xb81\xe4X\x8e\x0e\xa3\x1fPK\x07\x08\xb9\x86\x81Pp\x01\x00\x00.\x08\x00\x00PK\x01\x02\x14\x00\x14\x00\x08\x08\x08\x00\x80"UN\xb9\x86\x81Pp\x01\x00\x00.\x08\x00\x00\x14\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00redaction-rules.jsonPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00B\x00\x00\x00\xb2\x01\x00\x00\x00\x00', u'expected_exitcodes': [], u'run_generation': 2, u'start_timeout_seconds': None, u'optional_tags': [], u'parcels': {}}], u'server_manages_parcels': True, u'heartbeat_interval': 15, u'parcels_directory': u'/opt/cloudera/parcels', u'host_id': u'caa7e6d9-d5ae-4ece-bb89-bc13333c9aa9', u'cm_guid': u'1e8e8289-ab7a-4a18-b08e-06b9afac0d73', u'eventserver_host': u'clouderapre-node1.fintonic.com', u'enabled_metric_reporters': [u'SPARK_ON_YARN-SPARK_YARN_HISTORY_SERVER', u'SPARK_YARN_HISTORY_SERVER', u'HBASE-HBASERESTSERVER', u'HBASERESTSERVER', u'SPARK', u'SPARK', u'ACCUMULO_C6-ACCUMULO_GC', u'ACCUMULO_GC', u'HBASE', u'HBASE', u'MGMT-EVENTSERVER', u'EVENTSERVER', u'KAFKA-KAFKA_BROKER', u'KAFKA_BROKER', u'HBASE-REGIONSERVER', u'REGIONSERVER', u'LUNA_KMS-HSMKP_LUNA', u'HSMKP_LUNA', u'THALES_KMS', u'THALES_KMS', u'IMPALA-IMPALAD', u'IMPALAD', u'AUTH-AUTHSERVER', u'AUTHSERVER', u'ISILON', u'ISILON', u'YARN-NODEMANAGER', u'NODEMANAGER', u'MAPREDUCE', u'MAPREDUCE', u'ACCUMULO16-ACCUMULO16_TRACER', u'ACCUMULO16_TRACER', u'KMS', u'KMS', u'ACCUMULO16-ACCUMULO16_MONITOR', u'ACCUMULO16_MONITOR', u'YARN-JOBHISTORY', u'JOBHISTORY', u'KEYTRUSTEE', u'KEYTRUSTEE', u'HDFS-JOURNALNODE', u'JOURNALNODE', u'KAFKA', u'KAFKA', u'SPARK-SPARK_HISTORY_SERVER', u'SPARK_HISTORY_SERVER', u'HDFS-NAMENODE', u'NAMENODE', u'MAPREDUCE-TASKTRACKER', u'TASKTRACKER', u'IMPALA-CATALOGSERVER', u'CATALOGSERVER', u'SPARK2_ON_YARN', u'SPARK2_ON_YARN', u'KUDU-KUDU_MASTER', u'KUDU_MASTER', u'LUNA_KMS', u'LUNA_KMS', u'HDFS-DSSDDATANODE', u'DSSDDATANODE', u'SENTRY', u'SENTRY', u'ACCUMULO16-ACCUMULO16_GC', u'ACCUMULO16_GC', u'MGMT-NAVIGATOR', u'NAVIGATOR', u'MGMT-TELEMETRYPUBLISHER', u'TELEMETRYPUBLISHER', u'HIVE', u'HIVE', u'ACCUMULO_C6-ACCUMULO_MASTER', u'ACCUMULO_MASTER', u'SQOOP-SQOOP_SERVER', u'SQOOP_SERVER', u'KAFKA-KAFKA_MIRROR_MAKER', u'KAFKA_MIRROR_MAKER', u'KUDU', u'KUDU', u'ACCUMULO16-ACCUMULO16_MASTER', u'ACCUMULO16_MASTER', u'OOZIE', u'OOZIE', u'SQOOP_CLIENT', u'SQOOP_CLIENT', u'OOZIE-OOZIE_SERVER', u'OOZIE_SERVER', u'HDFS-FAILOVERCONTROLLER', u'FAILOVERCONTROLLER', u'YARN', u'YARN', u'HDFS-NFSGATEWAY', u'NFSGATEWAY', u'HDFS-HTTPFS', u'HTTPFS', u'HUE-KT_RENEWER', u'KT_RENEWER', u'KEYTRUSTEE_SERVER-DB_ACTIVE', u'DB_ACTIVE', u'KS_INDEXER-HBASE_INDEXER', u'HBASE_INDEXER', u'ACCUMULO_C6-ACCUMULO_TSERVER', u'ACCUMULO_TSERVER', u'ACCUMULO16', u'ACCUMULO16', u'KEYTRUSTEE-KMS_KEYTRUSTEE', u'KMS_KEYTRUSTEE', u'SOLR-SOLR_SERVER', u'SOLR_SERVER', u'HOST', u'KEYTRUSTEE_SERVER-KEYTRUSTEE_PASSIVE_SERVER', u'KEYTRUSTEE_PASSIVE_SERVER', u'IMPALA-STATESTORE', u'STATESTORE', u'HDFS-DATANODE', u'DATANODE', u'YARN-RESOURCEMANAGER', u'RESOURCEMANAGER', u'HUE-HUE_SERVER', u'HUE_SERVER', u'MGMT-NAVIGATORMETASERVER', u'NAVIGATORMETASERVER', u'HBASE-MASTER', u'MASTER', u'KEYTRUSTEE_SERVER-DB_PASSIVE', u'DB_PASSIVE', u'SPARK_ON_YARN', u'SPARK_ON_YARN', u'SPARK2_ON_YARN-SPARK2_YARN_HISTORY_SERVER', u'SPARK2_YARN_HISTORY_SERVER', u'MGMT-REPORTSMANAGER', u'REPORTSMANAGER', u'MGMT-SERVICEMONITOR', u'SERVICEMONITOR', u'MGMT-ALERTPUBLISHER', u'ALERTPUBLISHER', u'HIVE-HIVESERVER2', u'HIVESERVER2', u'MGMT-ACTIVITYMONITOR', u'ACTIVITYMONITOR', u'AUTH-AUTH_LOAD_BALANCER', u'AUTH_LOAD_BALANCER', u'MAPREDUCE-FAILOVERCONTROLLER', u'FAILOVERCONTROLLER', u'ZOOKEEPER', u'ZOOKEEPER', u'MGMT-HOSTMONITOR', u'HOSTMONITOR', u'AUTH', u'AUTH', u'IMPALA', u'IMPALA', u'KEYTRUSTEE_SERVER-KEYTRUSTEE_ACTIVE_SERVER', u'KEYTRUSTEE_ACTIVE_SERVER', u'SOLR', u'SOLR', u'ACCUMULO_C6', u'ACCUMULO_C6', u'ACCUMULO_C6-ACCUMULO_TRACER', u'ACCUMULO_TRACER', u'ACCUMULO16-ACCUMULO16_TSERVER', u'ACCUMULO16_TSERVER', u'LUNA_KMS-HSMKP_METASTORE_LUNA', u'HSMKP_METASTORE_LUNA', u'HBASE-HBASETHRIFTSERVER', u'HBASETHRIFTSERVER', u'ACCUMULO_C6-ACCUMULO_MONITOR', u'ACCUMULO_MONITOR', u'FLUME', u'FLUME', u'HUE', u'HUE', u'HDFS-SECONDARYNAMENODE', u'SECONDARYNAMENODE', u'SENTRY-SENTRY_SERVER', u'SENTRY_SERVER', u'THALES_KMS-HSMKP_METASTORE_THALES', u'HSMKP_METASTORE_THALES', u'HIVE-HIVEMETASTORE', u'HIVEMETASTORE', u'IMPALA-LLAMA', u'LLAMA', u'SPARK-SPARK_WORKER', u'SPARK_WORKER', u'MGMT', u'MGMT', u'HIVE-WEBHCAT', u'WEBHCAT', u'SQOOP', u'SQOOP', u'HUE-HUE_LOAD_BALANCER', u'HUE_LOAD_BALANCER', u'FLUME-AGENT', u'AGENT', u'HDFS', u'HDFS', u'KUDU-KUDU_TSERVER', u'KUDU_TSERVER', u'KMS-KMS', u'KMS', u'KS_INDEXER', u'KS_INDEXER', u'SPARK-SPARK_MASTER', u'SPARK_MASTER', u'ZOOKEEPER-SERVER', u'SERVER', u'KEYTRUSTEE_SERVER', u'KEYTRUSTEE_SERVER', u'MAPREDUCE-JOBTRACKER', u'JOBTRACKER', u'THALES_KMS-HSMKP_THALES', u'HSMKP_THALES'], u'flood_seed_timeout': 100, u'eventserver_port': 7185}
Traceback (most recent call last):
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1528, in handle_heartbeat_response
    self._handle_heartbeat_response(response)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1661, in _handle_heartbeat_response
    self._update_parcel_activation_state(response)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/agent.py", line 1600, in _update_parcel_activation_state
    manage_new_parcels)
  File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/parcel.py", line 640, in configure_all_symlinks
    self.ensure_active_symlink(prod[version], False)
KeyError: '5.15.1-1.cdh5.15.1.p0.4'

Expert Contributor
Posts: 64
Registered: ‎11-04-2016

Re: CM 6.1 hosts status Unknown Health randomly and periodically

Sorry I forgot about this issue. I solved this by reading the error logs and fixing my NTP communication. The problem was not related to 6.1 or upgrading from 5.16.1 to 6.x so I am going to say it was as usual related to Time/Date/NTP.
Announcements