Created on 11-24-2016 07:00 AM - edited 11-24-2016 07:02 AM
Hello,
One of the Cloudera Agents shut down with this error:
[15/Nov/2016 19:48:05 +0000] 14910 MainThread agent INFO Stopping agent... [15/Nov/2016 19:48:05 +0000] 14910 MainThread agent INFO No extant cgroups; unmounting any cgroup roots [15/Nov/2016 19:48:05 +0000] 14910 MainThread agent INFO 10 processes are being managed; Supervisor will continue to run. [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPING [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) shut down [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Stopped thread '_TimeoutMonitor'. [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPED [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPING [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) already shut down [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE No thread running for None. [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPED [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus EXITING [15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus EXITED [15/Nov/2016 19:48:05 +0000] 14910 MainThread agent INFO Cleaning up daemon [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent INFO Stopping agent... [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent INFO No extant cgroups; unmounting any cgroup roots [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent ERROR Shutdown callback failed. Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop f() File "/usr/lib64/python2.7/asyncore.py", line 409, in close self.socket.close() File "/usr/lib64/python2.7/asyncore.py", line 636, in close os.close(self.fd) OSError: [Errno 9] Bad file descriptor [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent INFO 10 processes are being managed; Supervisor will continue to run. [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent ERROR Shutdown callback failed. Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop f() File "/usr/lib64/python2.7/asyncore.py", line 409, in close self.socket.close() File "/usr/lib64/python2.7/asyncore.py", line 636, in close os.close(self.fd) OSError: [Errno 9] Bad file descriptor [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPING [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) already shut down [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE No thread running for None. [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPED [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPING [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) already shut down [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE No thread running for None. [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus STOPPED [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus EXITING [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging INFO [15/Nov/2016:19:48:05] ENGINE Bus EXITED [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent ERROR Shutdown callback failed. Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop f() File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/pyinotify-0.9.3-py2.7.egg/pyinotify.py", line 1424, in stop self._pollobj.unregister(self._fd) KeyError: 15 [15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent INFO Cleaning up daemon
And I cant restart the agent because it seems that it was unistalled. All the services work fine, only the Clouder agent stopped working.
Please help, I'm very lost with this.
Regards,
Joaquín
Created 11-25-2016 08:33 PM
Hi,
There is no way to tell what happened to your agent, but it appears it was stopped manually. You could try checking "last" and "history" to try to figure out if someone may have done that.
If your agent is removed (did you use "rpm -qa |grep cloudera" ?) the you will need to add the agent back again and try starting.
when you run "service cloudera-scm-agent start" what happens?
Created 11-25-2016 07:18 PM
Hi,
Did you recently enable TLS encryption of any level [1] and on this particular bad node agent config file /etc/cloudera-scm-agent/config.ini was not configured? Compare config.ini with other working node and if found different according to documentation changes, copy it, restart CM server and agents on bad node.
Let me know if it helps.
~Salim
[1] http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_config_tls_security.html
Created 11-25-2016 08:33 PM
Hi,
There is no way to tell what happened to your agent, but it appears it was stopped manually. You could try checking "last" and "history" to try to figure out if someone may have done that.
If your agent is removed (did you use "rpm -qa |grep cloudera" ?) the you will need to add the agent back again and try starting.
when you run "service cloudera-scm-agent start" what happens?
Created 11-26-2016 06:57 AM
No one stopped or uninstalled the agent manually because I'm the only one that manages that server. What I did that day was reinstall a MySQL server, I don't know if that is related with this issue.
Running cloudera-scm-agent seems that is was uninstalled:
Failed to start cloudera-scm-agent.service: Unit cloudera-scm-agent.service failed to load: No such file or directory.
So I reinstalled the agent and now is working.
Thanks