Support Questions
Find answers, ask questions, and share your expertise

Starting cloudera-scm-agent after next_start_clean deletes .pid file

New Contributor

During an investigation for cluster instabilities, I have noticed that cloudera-scm-agent.pid is removed after a clean restart (on the CM5.7 / CentOS7 cluster).

 

(I used commands listed on the official document )

 

$ sudo service cloudera-scm-agent next_start_clean
$ sudo service cloudera-scm-agent start

 

 

Due to the lack of the pid file, stop command to the init script does not actually stop the running agent process, and when I try to start the agent process next time, the init script allows to run second agent process. I suspect that the duplication of the agent processes led to the cluster instability.

 

It occurs on a CM/CDH5.7 cluster, but doesn't occur on a CM/CDH5.5.2 cluster (both clusters are based on CentOS7).

I also have found that the default location of the cloudera-scm-agent.pid is changed between these two versions, ( from /var/run/ to /var/run/cloudera-scm-agent/, in /etc/init.d/cloudera-scm-agent ). From the contents of /usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.0-py2.7.egg/cmf/agent.py, the latter path seems a target of the rmtree().

 

Is it a good idea to try to workaround this problem by changing /etc/init.d/cloudera-scm-agent as follows?

 

-- cloudera-scm-agent.orig 2016-05-06 17:06:11.558271813 +0900
+++ cloudera-scm-agent 2016-05-06 17:06:26.625264949 +0900
@@ -97,7 +97,7 @@
fi
#pid file
-pidfile=${PIDFILE-${CMF_VAR:-/var}/run/cloudera-scm-agent/$prog.pid}
+pidfile=${PIDFILE-${CMF_VAR:-/var}/run/$prog.pid}
# Marker files for working around systemd
clean_start_file=${CMF_VAR:-/var}/run/$prog/next_start_clean

 

0 REPLIES 0