Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cloudera Agent shut down

avatar
Contributor

Hello,

 

One of the Cloudera Agents shut down with this error:

 

[15/Nov/2016 19:48:05 +0000] 14910 MainThread agent        INFO     Stopping agent...
[15/Nov/2016 19:48:05 +0000] 14910 MainThread agent        INFO     No extant cgroups; unmounting any cgroup roots
[15/Nov/2016 19:48:05 +0000] 14910 MainThread agent        INFO     10 processes are being managed; Supervisor will continue to run.
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPING
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) shut down
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Stopped thread '_TimeoutMonitor'.
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPED
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPING
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) already shut down
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE No thread running for None.
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPED
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus EXITING
[15/Nov/2016 19:48:05 +0000] 14910 MainThread _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus EXITED
[15/Nov/2016 19:48:05 +0000] 14910 MainThread agent        INFO     Cleaning up daemon
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        INFO     Stopping agent...
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        INFO     No extant cgroups; unmounting any cgroup roots
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        ERROR    Shutdown callback failed.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop
    f()
  File "/usr/lib64/python2.7/asyncore.py", line 409, in close
    self.socket.close()
  File "/usr/lib64/python2.7/asyncore.py", line 636, in close
    os.close(self.fd)
OSError: [Errno 9] Bad file descriptor
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        INFO     10 processes are being managed; Supervisor will continue to run.
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        ERROR    Shutdown callback failed.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop
    f()
  File "/usr/lib64/python2.7/asyncore.py", line 409, in close
    self.socket.close()
  File "/usr/lib64/python2.7/asyncore.py", line 636, in close
    os.close(self.fd)
OSError: [Errno 9] Bad file descriptor
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPING
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) already shut down
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE No thread running for None.
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPED
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPING
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE HTTP Server cherrypy._cpwsgi_server.CPWSGIServer(('totoro.akainix.local', 9000)) already shut down
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE No thread running for None.
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus STOPPED
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus EXITING
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 _cplogging   INFO     [15/Nov/2016:19:48:05] ENGINE Bus EXITED
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        ERROR    Shutdown callback failed.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.8.1-py2.7.egg/cmf/agent.py", line 2777, in stop
    f()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/pyinotify-0.9.3-py2.7.egg/pyinotify.py", line 1424, in stop
    self._pollobj.unregister(self._fd)
KeyError: 15
[15/Nov/2016 19:48:05 +0000] 14910 Dummy-14 agent        INFO     Cleaning up daemon

And I cant restart the agent because it seems that it was unistalled. All the services work fine, only the Clouder agent stopped working.

 

Please help, I'm very lost with this.

 

Regards,

Joaquín

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi,

 

There is no way to tell what happened to your agent, but it appears it was stopped manually.  You could try checking "last" and "history" to try to figure out if someone may have done that.

 

If your agent is removed (did you use "rpm -qa |grep cloudera" ?) the you will need to add the agent back again and try starting.

 

when you run "service cloudera-scm-agent start" what happens?

 

 

View solution in original post

3 REPLIES 3

avatar
Cloudera Employee

Hi,

 

Did you recently enable TLS encryption of any level [1] and on this particular bad node agent config file /etc/cloudera-scm-agent/config.ini was not configured? Compare config.ini with other working node and if found different according to documentation changes, copy it, restart CM server and agents on bad node.

 

Let me know if it helps.

 

~Salim

 

[1] http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_config_tls_security.html

avatar
Master Guru

Hi,

 

There is no way to tell what happened to your agent, but it appears it was stopped manually.  You could try checking "last" and "history" to try to figure out if someone may have done that.

 

If your agent is removed (did you use "rpm -qa |grep cloudera" ?) the you will need to add the agent back again and try starting.

 

when you run "service cloudera-scm-agent start" what happens?

 

 

avatar
Contributor

No one stopped or uninstalled the agent manually because I'm the only one that manages that server. What I did that day was reinstall a MySQL server, I don't know if that is related with this issue.

 

Running cloudera-scm-agent  seems that is was uninstalled:

Failed to start cloudera-scm-agent.service: Unit cloudera-scm-agent.service failed to load: No such file or directory.

So I reinstalled the agent and now is working.

 

Thanks