Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cloudera Management Service failed to start

avatar
Explorer

Hi

 

env: cdh 5.8.1 ubuntu 14.04

 

We meet a problem when we try to change the hostname and IP address.

After we change the host on /etc/cloudera-scm-agent/config.ini and postSQL database.

Then we cannot start the Cloudera Management Service  and all the service(eg hdfs) even we change it all back.

and  dir /run/cloudera-scm-agent/process is empty.

and the host status is Unknown run condition, no heartbeat, no CDH version found.

 

The cloudera-scm-agent log shows:

error: [Errno 111] Connection refused
[06/Mar/2017 17:48:50 +0000] 20996 MainThread heartbeat_tracker INFO     HB stats (seconds): num:40 LIFE_MIN:0.00 min:0.00 mean:0.01 max:0.01 LIFE_MAX:0.02
[06/Mar/2017 17:55:01 +0000] 20996 MonitorDaemon-Reporter throttling_logger ERROR    (10 skipped) Error sending messages to firehose: mgmt_HOSTMONITOR_59a6b670b59fd6bcb192ec82edd2b1a3
Traceback (most recent call last):
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/monitor/firehose.py", line 116, in _send
    self._port)
  File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
    self.conn.connect()
  File "/usr/lib/python2.7/httplib.py", line 807, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection

 

it seems the host monitor service  can not start.

I have already changes hosts and DNS configuration but it wont help.

Do anyone have any suggestion for this problem?

 

Thank you.

1 ACCEPTED SOLUTION

avatar
Explorer

 

I have fixed this issue now.

 

the reason is that when deploy the cloudera management service on the server with hostname localhost.

and deploy services to another node B, B sent message to localhost:7184,  then got connection refused.

 

the solution is that rename hostname to cmxx and added it in /etc/hosts, /etc/hostname, dns,

and update the hosts table in database scm.

 

then re-deploy the cloudera management service with the new hostname cmxx, finally,node B sent message to cmxx:7184

View solution in original post

5 REPLIES 5

avatar
Master Guru

@kevin001,

 

Thanks for your question.

 

When you say you changed the hostname and IP address, how was that done? hosts file... dns...?

What was the old and new hostname (if you can share with us)

 

Cloudera Manager's agents use a uuid (most of the time) as a unique identifier for a host, so a change of IP or hostname should not impact the heartbeat in that case.  If the uuid changed in some way (/var/lib/cloudera-scm-agent/uuid), then you could end up with two entries for the host (old and new hostname) in Cloudera Manager.

 

First, we need to define the problem, though.  You mention that the host where Host Monitor runs is not showing a heartbeat in Cloudera Manager?  If so, we need to see the heartbeat exception in the Cloudera Agent log ( /var/log/cloudera-scm-agent/cloudera-scm-agent.log )

 

Once we have a better handle on what the issue is, we can decide in which direction to take the debugging.

 

Ben

avatar
Explorer

@bgooley

 

Thanks for your help.

 

the old hostname is localhost and ip is 127.0.0.1. and the new hostname is node1, ip is 192.168.3.155

I changed it in /etc/hosts, /etc/hostname, local DNS resovler and update the table host in db(scm).

 

and I had removed the uuid file in /var/lib/clouder-scm-agent and excute service clouder-scm-agnet clean_start

I do get two entries for the host (old and new hostname) in Cloudera Manager

the old hostname is missing the hearbeat. no version detected

and the new hostname got heartbeat, and got version, but the roles of new hostname is empty, all the services did not start.

 

and all the Cloudera Management Service failed to start

got following log in cloudera manager home page

" request host monitor timeout, request service monitor timeout"

I checked the processes of host monitor and service monitor, both did not start.

and there is no logs in /var/log/cloudera-scm-firehose for the new hostname.

 

 So I want to know how to migrate the roles from old hostname to the new hostname

and how to solve the host monitor start issue.

 

Thank you

 

avatar
Explorer

@bgooley

 

I deleted the management service and added it again, now the Cloudera Management Service works well.

But how to migrate the roles of old hostname to the new hostname?

 

Thank you.

avatar
Explorer

 

I have fixed this issue now.

 

the reason is that when deploy the cloudera management service on the server with hostname localhost.

and deploy services to another node B, B sent message to localhost:7184,  then got connection refused.

 

the solution is that rename hostname to cmxx and added it in /etc/hosts, /etc/hostname, dns,

and update the hosts table in database scm.

 

then re-deploy the cloudera management service with the new hostname cmxx, finally,node B sent message to cmxx:7184

avatar
Explorer

Hello, 

 

Could you explain the resolution of this problem with details?

 

Thanks