Created on 08-03-2016 02:02 PM - edited 09-16-2022 03:32 AM
Hi all !
Its now time to create replace our Cloudera 4 by Cloudera 5 !
I start the first,as install new hardware ! Good thing done !
Now I start the application installation/configuration. FYI, I will install Cloudera 5.7.1.
I have a machine for the Cloudera Manager, one for the a MySQL server with "monitor/metastore".
I install the Manager with the MySQL information so fr the "scm" database.
Everything fine.
Then I intregrate the Manager machine to the a new cluster, configure the Host Monitor to access to the monitor database.
It works fine, now all Cloudera Manager Service works fine. I think !
Now I would like to integrate 4 new node to cluster.
2 will serve for "HA" name node, ans 2 others as datanode. Six others nodes will join the cluster later.
So with Cloudera Manager, I launch the wizard to add the new hosts.
Answer to all question, and the installation start. Everything fine and installation is done with success.
But after that, it have to run the "inspector job" on all hosts.
And here I encount this warning:
newnode.domain.ltd: Command aborted because of exception: Command timed-out after 150 seconds 4 hosts are reporting with NONE CDH version There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components.
I use a local reposync, and with sync only the 5.7.1 version. So no problems with differents version, 5.7.1 is installed everywhere.
Then, if I go to the Manager interface, I can see all hosts int the "Hosts" section. But all are in red status.
If I select one, I have this message:
This host is in contact with the Cloudera Manager Server. This host is not in contact with the Host Monitor.
So the Cloudera Manager Agent seems to be ok, but it seems to can't contact the "Host Monitor"...
But the "Host Monitor" is install and configured on the same server. So...
And the status of ths "Host Monitor" is green, and can contact my remote MySQL database.
Si I don't know why I get this error.
No firewall between my machine, no Selinux.
I don't know why they can contact the "Cloudera Manager" but not the "Host Monitor"
The /etc/cloudera-scm-agent/config.ini config file is ok. Good IP and port.
server_host=10.x.x.x server_port=7182
On cloudera Manager, the "cloudera-scm-server" and "cloudera-scm-agent" run nicely.
On my new node, "cloudera-scm-agent" run nicely too. But I get this error in log.
But not sure it could be the reason, and if it is, I don't know how to solve it.
[03/Aug/2016 20:56:32 +0000] 158533 MainThread agent INFO Flood daemon (re)start attempt [03/Aug/2016 20:56:32 +0000] 158533 MainThread agent ERROR Failed to handle Heartbeat Response: {u'firehoses': [{u'roletype': u'ACTIVITYMONITOR', u'rolename': u'mgmt-ACTIVITYMONITOR-728c1b31088c1d8ddc2547d70b884cf7', u'port': 9999, u'report_interval': 60, u'address': u'clouderamanager.domain.ltd'}, {u'roletype': u'SERVICEMONITOR', u'rolename': u'mgmt-SERVICEMONITOR-728c1b31088c1d8ddc2547d70b884cf7', u'port': 9997, u'report_interval': 60, u'address': u'clouderamanager.domain.ltd'}, {u'roletype': u'HOSTMONITOR', u'rolename': u'mgmt-HOSTMONITOR-728c1b31088c1d8ddc2547d70b884cf7', u'port': 9995, u'report_interval': 60, u'address': u'clouderamanager.domain.ltd'}], u'rm_enabled': False, u'client_configs': [], u'create_parcel_symlinks': True, u'server_managed_parcels': [], u'extra_configs': None, u'host_collection_config_data': [{u'config_name': u'host_network_interface_collection_filter', u'config_value': u'^lo$'}, {u'config_name': u'host_disk_collection_filter', u'config_value': u'^$'}, {u'config_name': u'host_fs_collection_filter', u'config_value': u'^$'}, {u'config_name': u'host_log_tailing_config', u'config_value': u'{}\n'}, {u'config_name': u'host_dns_resolution_duration_thresholds', u'config_value': u'{"critical":"never","warning":"1000.0"}'}, {u'config_name': u'host_dns_resolution_enabled', u'config_value': u'true'}, {u'config_name': u'host_clock_offset_thresholds', u'config_value': u'{"critical":"10000.0","warning":"3000.0"}'}], u'apply_parcel_users_groups_permissions': True, u'flood_torrent_port': 7191, u'log_tailing_config': u'{}\n', u'active_parcels': {}, u'flood_rack_peers': [u'10.2.0.33:7191', u'10.2.0.31:7191', u'10.2.0.34:7191', u'10.2.0.29:7191', u'10.2.0.30:7191'], u'retain_parcels_in_cache': True, u'processes': [{u'status_links': {}, u'name': u'cluster-host-inspector', u'config_generation': 0, u'configuration_data': 'PK\x03\x04\x14\x00\x08\x08\x08\x00\x83\x90\x03I\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00input.json\xb5\xd2\xbb\x0e\xc2 \x18\x05\xe0\xbdOA\x98[\x02\xbd\x98\xe8\xd6\xe8\xd0\xc5\xd4\xb8\x1a\x07\x14\x92\x12)4\xa5\x9d\x9a\xbe\xbb\x80q\x04\xbb8r\xfe\xc3\x07\t,\t\x00\x90J\xd9h3\x19\x08\x0e\xe0\x06\x16\x1b\xd9\xb0\xb3\x89\xa2=w!4Jd\x1deZ\x0f\x19\xc69\x923\x13\x14\xd9Y\xfa\xe9\n\xe6Z\xd5w5\xd4\x8c\x8d\xdcx\x0f\x12\x8cr\x84Q\x81\xa1\x9d\xae\xe9o~\x17\xe0\xf3(_n\xe5I\x80/c|\xbe\xdf\xcaW\x01\xbe\x88\xde\xbe\xd8\xca\x17\x01\x9eDy\xe2ypw%(\x94\x19\xf8s\x12Z\xf9\x92\x9a\xa5\xf4\xf9\x8b\x8f\x0f>js\xe5T\xf6~{S\x9f\xda\xf6\x82\x8e\xed\xd9\x1f\x06\xa7N\x18\xf7Q\xdc\xf0\xaf\xcf\x98\xacoPK\x07\x089m\\\xdd\xbe\x00\x00\x00\x98\x02\x00\x00PK\x01\x02\x14\x00\x14\x00\x08\x08\x08\x00\x83\x90\x03I9m\\\xdd\xbe\x00\x00\x00\x98\x02\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00input.jsonPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x008\x00\x00\x00\xf6\x00\x00\x00\x00\x00', u'refresh_files': [], u'user': u'root', u'parcels': {}, u'auto_restart': False, u'run_generation': 2, u'extra_groups': [], u'environment': {}, u'optional_tags': [], u'running': False, u'program': u'mgmt/mgmt.sh', u'required_tags': [], u'arguments': [u'inspector', u'input.json', u'output.json', u'DEFAULT'], u'special_file_info': [], u'group': u'root', u'id': 34, u'resources': [], u'one_off': True}], u'server_manages_parcels': True, u'heartbeat_interval': 15, u'parcels_directory': u'/opt/cloudera/parcels', u'host_id': u'8a18c6eb-7f32-4e90-a6f9-88d1feeacd21', u'eventserver_host': u'clouderamanager.domain.ltd', u'enabled_metric_reporters': [u'ACCUMULO16', u'ACCUMULO16', u'KEYTRUSTEE-KMS_KEYTRUSTEE', u'KMS_KEYTRUSTEE', u'SPARK_ON_YARN-SPARK_YARN_HISTORY_SERVER', u'SPARK_YARN_HISTORY_SERVER', u'SOLR-SOLR_SERVER', u'SOLR_SERVER', u'HBASE-HBASERESTSERVER', u'HBASERESTSERVER', u'HOST', u'KEYTRUSTEE_SERVER-KEYTRUSTEE_PASSIVE_SERVER', u'KEYTRUSTEE_PASSIVE_SERVER', u'IMPALA-STATESTORE', u'STATESTORE', u'SPARK', u'SPARK', u'HBASE', u'HBASE', u'ACCUMULO-ACCUMULO_TRACER', u'ACCUMULO_TRACER', u'HDFS-DATANODE', u'DATANODE', u'ACCUMULO-ACCUMULO_MASTER', u'ACCUMULO_MASTER', u'YARN-RESOURCEMANAGER', u'RESOURCEMANAGER', u'HUE-HUE_SERVER', u'HUE_SERVER', u'ACCUMULO-ACCUMULO_MONITOR', u'ACCUMULO_MONITOR', u'MGMT-EVENTSERVER', u'EVENTSERVER', u'MGMT-NAVIGATORMETASERVER', u'NAVIGATORMETASERVER', u'HBASE-MASTER', u'MASTER', u'KAFKA-KAFKA_BROKER', u'KAFKA_BROKER', u'KEYTRUSTEE_SERVER-DB_PASSIVE', u'DB_PASSIVE', u'HBASE-REGIONSERVER', u'REGIONSERVER', u'SPARK_ON_YARN', u'SPARK_ON_YARN', u'MGMT-REPORTSMANAGER', u'REPORTSMANAGER', u'MGMT-SERVICEMONITOR', u'SERVICEMONITOR', u'IMPALA-IMPALAD', u'IMPALAD', u'MGMT-ALERTPUBLISHER', u'ALERTPUBLISHER', u'HIVE-HIVESERVER2', u'HIVESERVER2', u'MGMT-ACTIVITYMONITOR', u'ACTIVITYMONITOR', u'ISILON', u'ISILON', u'YARN-NODEMANAGER', u'NODEMANAGER', u'MAPREDUCE-FAILOVERCONTROLLER', u'FAILOVERCONTROLLER', u'ACCUMULO', u'ACCUMULO', u'MAPREDUCE', u'MAPREDUCE', u'ZOOKEEPER', u'ZOOKEEPER', u'KMS', u'KMS', u'ACCUMULO16-ACCUMULO16_TRACER', u'ACCUMULO16_TRACER', u'ACCUMULO16-ACCUMULO16_MONITOR', u'ACCUMULO16_MONITOR', u'MGMT-HOSTMONITOR', u'HOSTMONITOR', u'YARN-JOBHISTORY', u'JOBHISTORY', u'KEYTRUSTEE', u'KEYTRUSTEE', u'HDFS-JOURNALNODE', u'JOURNALNODE', u'KAFKA', u'KAFKA', u'IMPALA', u'IMPALA', u'SPARK-SPARK_HISTORY_SERVER', u'SPARK_HISTORY_SERVER', u'KEYTRUSTEE_SERVER-KEYTRUSTEE_ACTIVE_SERVER', u'KEYTRUSTEE_ACTIVE_SERVER', u'HDFS-NAMENODE', u'NAMENODE', u'HUE-BEESWAX_SERVER', u'BEESWAX_SERVER', u'SOLR', u'SOLR', u'ACCUMULO16-ACCUMULO16_TSERVER', u'ACCUMULO16_TSERVER', u'MAPREDUCE-TASKTRACKER', u'TASKTRACKER', u'IMPALA-CATALOGSERVER', u'CATALOGSERVER', u'HDFS-DSSDDATANODE', u'DSSDDATANODE', u'SENTRY', u'SENTRY', u'ACCUMULO16-ACCUMULO16_GC', u'ACCUMULO16_GC', u'MGMT-NAVIGATOR', u'NAVIGATOR', u'HIVE', u'HIVE', u'HBASE-HBASETHRIFTSERVER', u'HBASETHRIFTSERVER', u'SQOOP-SQOOP_SERVER', u'SQOOP_SERVER', u'KAFKA-KAFKA_MIRROR_MAKER', u'KAFKA_MIRROR_MAKER', u'FLUME', u'FLUME', u'HUE', u'HUE', u'HDFS-SECONDARYNAMENODE', u'SECONDARYNAMENODE', u'SENTRY-SENTRY_SERVER', u'SENTRY_SERVER', u'ACCUMULO-ACCUMULO_TSERVER', u'ACCUMULO_TSERVER', u'ACCUMULO-ACCUMULO_GC', u'ACCUMULO_GC', u'HIVE-HIVEMETASTORE', u'HIVEMETASTORE', u'IMPALA-LLAMA', u'LLAMA', u'ACCUMULO16-ACCUMULO16_MASTER', u'ACCUMULO16_MASTER', u'SPARK-SPARK_WORKER', u'SPARK_WORKER', u'MGMT', u'MGMT', u'HIVE-WEBHCAT', u'WEBHCAT', u'SQOOP', u'SQOOP', u'HUE-HUE_LOAD_BALANCER', u'HUE_LOAD_BALANCER', u'ACCUMULO-ACCUMULO_LOGGER', u'ACCUMULO_LOGGER', u'HDFS', u'HDFS', u'FLUME-AGENT', u'AGENT', u'OOZIE', u'OOZIE', u'SQOOP_CLIENT', u'SQOOP_CLIENT', u'OOZIE-OOZIE_SERVER', u'OOZIE_SERVER', u'KMS-KMS', u'KMS', u'HDFS-FAILOVERCONTROLLER', u'FAILOVERCONTROLLER', u'KS_INDEXER', u'KS_INDEXER', u'SPARK-SPARK_MASTER', u'SPARK_MASTER', u'YARN', u'YARN', u'ZOOKEEPER-SERVER', u'SERVER', u'HDFS-NFSGATEWAY', u'NFSGATEWAY', u'HDFS-HTTPFS', u'HTTPFS', u'HUE-KT_RENEWER', u'KT_RENEWER', u'KEYTRUSTEE_SERVER', u'KEYTRUSTEE_SERVER', u'KEYTRUSTEE_SERVER-DB_ACTIVE', u'DB_ACTIVE', u'MAPREDUCE-JOBTRACKER', u'JOBTRACKER', u'KS_INDEXER-HBASE_INDEXER', u'HBASE_INDEXER'], u'flood_seed_timeout': 100, u'eventserver_port': 7185} Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/agent.py", line 1335, in handle_heartbeat_response self._handle_heartbeat_response(response) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/agent.py", line 1357, in _handle_heartbeat_response response["flood_torrent_port"]) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/agent.py", line 1823, in handle_heartbeat_flood self.mkabsdir(flood_dir, user=FLOOD_FS_USER, group=FLOOD_FS_GROUP, mode=0755) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/agent.py", line 1918, in mkabsdir path_grp = grp.getgrgid(stat_info.st_gid)[0] KeyError: 'getgrgid(): gid not found: 167
Do you have an idea why this error ?
Hope somebody can help me to have a good configuration and my node are green.
Regards,
Fabien