Created 05-19-2025 04:06 AM
Hi all,
I've created a custom service in Cloudera Manager (ABC_ROLE) and defined a startRunner that calls a control.sh script. This script eventually runs a start_and_monitor function that starts two internal services and enters a monitoring loop.
service.sdl Snippet:
"roles": [
{
"name": "ABC_ROLE",
"startRunner": {
"program": "scripts/control.sh",
"args": ["start"],
"environmentVariables": {
"LOG_LEVEL": "${log_level}"
}
},
"stopRunner": {
"relevantRoleTypes": ["ABC_ROLE"],
"runner": {
"program": "scripts/control.sh",
"args": ["stop"]
}
}
}
],
"rollingRestart": {
"workerSteps": {
"roleName": "ABC_ROLE",
"bringDownCommands": ["Stop"],
"bringUpCommands": ["Start"]
}
}
control.sh Snippet:
control.sh
start_service() {
echo "start_service() called"
sleep 60 #Adding sleep to test
exec ${SERVICE_PARCEL_HOME}/service/scripts/service_daemon.sh start_and_monitor
}
if [ "$CMD" == "start" ]; then
start_service
elif [ "$CMD" == "stop" ]; then
stop_service
elif [ "$CMD" == "status" ]; then
status_service_and_uptime
fi
service_daemon.sh Snippet:
start_service_daemons() {
echo "Starting first service..."
${SERVICE_PARCEL_HOME}/service1/bin/service1ctrl start
echo "Starting seconds service..."
${SERVICE_PARCEL_HOME}/service2/bin/service2ctrl start
}
start_and_monitor() {
trap stop_service_daemons SIGTERM
umask 0027
start_service_daemons
sleep 5
while true; do
${SERVICE_PARCEL_HOME}/service1/bin/service1ctrl status
service1_status=$?
${SERVICE_PARCEL_HOME}/service2/bin/service2ctrl status
service2_status=$?
if [ $service1_status -ne 0 ] || [ $service2_status -ne 0 ]; then
echo "One or more services are not running. Exiting..."
exit 1
else
echo "Services are running. Sleeping..."
sleep 60
fi
done
}
Problem:
During a rolling restart, Cloudera Manager immediately proceeds to start the next service (e.g., HDFS) right after my custom service's startRunner is invoked — even though the internal services are still starting up.
I tried adding sleep and health checks in start_and_monitor, but Cloudera Manager does not even wait for the 60-second sleep in the loop. It seems to treat the service as "started" as soon as the startRunner is launched, not when the service is actually ready.
My service also needs to connect to a server to download required files during startup. This process must complete successfully before the service is considered ready.
Question:
Is there a way to make Cloudera Manager wait until a script returns success (exit code 0) before considering the service as started during a rolling restart?
Thanks in advance!
Created 07-06-2025 05:24 AM
Hi @vineetchaure Modifying the CDP/Cloudera Manager installation script is not recommended, as it may lead to unintended issues
Created 07-07-2025 06:35 AM
hi @vineetchaure,
the Cloudera Manager itself manages the service startup scripts.
we do not recommend modifying the Cloudera service startup scripts, as this will harm and interfere with correct startup and consequently generate unwanted problems.