$ sudo service cloudera-scm-agent next_start_clean
$ sudo service cloudera-scm-agent start
Due to the lack of the pid file, stop command to the init script does not actually stop the running agent process, and when I try to start the agent process next time, the init script allows to run second agent process. I suspect that the duplication of the agent processes led to the cluster instability.
It occurs on a CM/CDH5.7 cluster, but doesn't occur on a CM/CDH5.5.2 cluster (both clusters are based on CentOS7).
I also have found that the default location of the cloudera-scm-agent.pid is changed between these two versions, ( from /var/run/ to /var/run/cloudera-scm-agent/, in /etc/init.d/cloudera-scm-agent ). From the contents of /usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/agent.py, the latter path seems a target of the rmtree().
Is it a good idea to try to workaround this problem by changing /etc/init.d/cloudera-scm-agent as follows?
-- cloudera-scm-agent.orig 2016-05-06 17:06:11.558271813 +0900
+++ cloudera-scm-agent 2016-05-06 17:06:26.625264949 +0900
@@ -97,7 +97,7 @@
# Marker files for working around systemd