After upgrading manager to 5.10 and all agents to 5.10, all services and hosts appear as "Unknown Health" and nothing work in the manager.
On agents I have errors like :
[10/Feb/2017 19:27:31 +0000] 64743 MonitorDaemon-Reporter throttling_logger ERROR (12 skipped) Error sending messages to firehose: mgmt1-SERVICEMONITOR-4c5c24980753678ebe83e319f270d1e4
Traceback (most recent call last):
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.10.0-py2.7.egg/cmf/monitor/firehose.py", line 116, in _send
File "/usr/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
File "/usr/lib/python2.7/httplib.py", line 757, in connect
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
error: [Errno 111] Connection refused
yes, and yes.
The only error in cloudera-scm-server.log are errors about parcels distribution, but nothing really worrying.
After trying a lot of things, I noticed that the services, not only management services, are in fact running, but they stay in 'STARTING' mode, which prevent the CM to do anything. For exemple I can start hdfs, CM will report that started fails, but I can connect the namenode webui.
I have this error when starting any service :
"org.hibernate.PropertyAccessException: Exception occurred inside setter of com.cloudera.cmf.model.DbProcess.resourcesForDb"
Did all that, multiple times. Still the same.
Agent could always communicate with CM, I have a heartbeat for all of them.
But starting of services still fails, I have this error each time :
and not a single warning or error in the agent and CM logs.