03-06-2017 02:55 AM
env: cdh 5.8.1 ubuntu 14.04
We meet a problem when we try to change the hostname and IP address.
After we change the host on /etc/cloudera-scm-agent/config.ini and postSQL database.
Then we cannot start the Cloudera Management Service and all the service(eg hdfs) even we change it all back.
and dir /run/cloudera-scm-agent/process is empty.
and the host status is Unknown run condition, no heartbeat, no CDH version found.
The cloudera-scm-agent log shows:
error: [Errno 111] Connection refused [06/Mar/2017 17:48:50 +0000] 20996 MainThread heartbeat_tracker INFO HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.00 mean:0.01 max:0.01 LIFE_MAX:0.02 [06/Mar/2017 17:55:01 +0000] 20996 MonitorDaemon-Reporter throttling_logger ERROR (10 skipped) Error sending messages to firehose: mgmt_HOSTMONITOR_59a6b670b59fd6bcb192ec82edd2b1a3 Traceback (most recent call last): File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/monitor/firehose.py", line 116, in _send self._port) File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__ self.conn.connect() File "/usr/lib/python2.7/httplib.py", line 807, in connect self.timeout, self.source_address) File "/usr/lib/python2.7/socket.py", line 571, in create_connection
it seems the host monitor service can not start.
I have already changes hosts and DNS configuration but it won’t help.
Do anyone have any suggestion for this problem?
03-06-2017 09:12 AM
Thanks for your question.
When you say you changed the hostname and IP address, how was that done? hosts file... dns...?
What was the old and new hostname (if you can share with us)
Cloudera Manager's agents use a uuid (most of the time) as a unique identifier for a host, so a change of IP or hostname should not impact the heartbeat in that case. If the uuid changed in some way (/var/lib/cloudera-scm-agent/uuid), then you could end up with two entries for the host (old and new hostname) in Cloudera Manager.
First, we need to define the problem, though. You mention that the host where Host Monitor runs is not showing a heartbeat in Cloudera Manager? If so, we need to see the heartbeat exception in the Cloudera Agent log ( /var/log/cloudera-scm-agent/cloudera-scm-agent.log )
Once we have a better handle on what the issue is, we can decide in which direction to take the debugging.
03-06-2017 06:24 PM
Thanks for your help.
the old hostname is localhost and ip is 127.0.0.1. and the new hostname is node1, ip is 192.168.3.155
I changed it in /etc/hosts, /etc/hostname, local DNS resovler and update the table host in db(scm).
and I had removed the uuid file in /var/lib/clouder-scm-agent and excute service clouder-scm-agnet clean_start
I do get two entries for the host (old and new hostname) in Cloudera Manager
the old hostname is missing the hearbeat. no version detected
and the new hostname got heartbeat, and got version, but the roles of new hostname is empty, all the services did not start.
and all the Cloudera Management Service failed to start
got following log in cloudera manager home page
" request host monitor timeout, request service monitor timeout"
I checked the processes of host monitor and service monitor, both did not start.
and there is no logs in /var/log/cloudera-scm-firehose for the new hostname.
So I want to know how to migrate the roles from old hostname to the new hostname
and how to solve the host monitor start issue.
03-10-2017 03:03 AM
I deleted the management service and added it again, now the Cloudera Management Service works well.
But how to migrate the roles of old hostname to the new hostname?
03-14-2017 06:28 PM
I have fixed this issue now.
the reason is that when deploy the cloudera management service on the server with hostname localhost.
and deploy services to another node B, B sent message to localhost:7184, then got connection refused.
the solution is that rename hostname to cmxx and added it in /etc/hosts, /etc/hostname, dns,
and update the hosts table in database scm.
then re-deploy the cloudera management service with the new hostname cmxx, finally,node B sent message to cmxx:7184