Support Questions

Find answers, ask questions, and share your expertise

How to add Delay during Rolling Restart Until Custom Service Fully Starts?

avatar
Frequent Visitor

Hi all,

I've created a custom service in Cloudera Manager (ABC_ROLE) and defined a startRunner that calls a control.sh script. This script eventually runs a start_and_monitor function that starts two internal services and enters a monitoring loop.

service.sdl Snippet:

"roles": [
  {
    "name": "ABC_ROLE",
    "startRunner": {
      "program": "scripts/control.sh",
      "args": ["start"],
      "environmentVariables": {
        "LOG_LEVEL": "${log_level}"
      }
    },
    "stopRunner": {
      "relevantRoleTypes": ["ABC_ROLE"],
      "runner": {
        "program": "scripts/control.sh",
        "args": ["stop"]
      }
    }
  }
],
"rollingRestart": {
  "workerSteps": {
    "roleName": "ABC_ROLE",
    "bringDownCommands": ["Stop"],
    "bringUpCommands": ["Start"]
  }
}

control.sh Snippet:

control.sh

start_service() {
    echo "start_service() called"
    sleep 60 #Adding sleep to test
    exec ${SERVICE_PARCEL_HOME}/service/scripts/service_daemon.sh start_and_monitor
}

if [ "$CMD" == "start" ]; then
    start_service
elif [ "$CMD" == "stop" ]; then
    stop_service
elif [ "$CMD" == "status" ]; then
    status_service_and_uptime
fi

service_daemon.sh Snippet:

 start_service_daemons() {
	
    echo "Starting first service..."
    ${SERVICE_PARCEL_HOME}/service1/bin/service1ctrl start

    echo "Starting seconds service..."
    ${SERVICE_PARCEL_HOME}/service2/bin/service2ctrl start

}

 start_and_monitor() {
    trap stop_service_daemons SIGTERM
    umask 0027

    start_service_daemons

    sleep 5

    while true; do
        ${SERVICE_PARCEL_HOME}/service1/bin/service1ctrl status
        service1_status=$?

        ${SERVICE_PARCEL_HOME}/service2/bin/service2ctrl status
        service2_status=$?

        if [ $service1_status -ne 0 ] || [ $service2_status -ne 0 ]; then
            echo "One or more services are not running. Exiting..."
            exit 1
        else
            echo "Services are running. Sleeping..."
            sleep 60
        fi
    done
}

Problem:
During a rolling restart, Cloudera Manager immediately proceeds to start the next service (e.g., HDFS) right after my custom service's startRunner is invoked — even though the internal services are still starting up.

I tried adding sleep and health checks in start_and_monitor, but Cloudera Manager does not even wait for the 60-second sleep in the loop. It seems to treat the service as "started" as soon as the startRunner is launched, not when the service is actually ready.

My service also needs to connect to a server to download required files during startup. This process must complete successfully before the service is considered ready.

Question:
Is there a way to make Cloudera Manager wait until a script returns success (exit code 0) before considering the service as started during a rolling restart? 

Thanks in advance!

2 REPLIES 2

avatar
Master Collaborator

Hi @vineetchaure  Modifying the CDP/Cloudera Manager installation script is not recommended, as it may lead to unintended issues 

avatar
Expert Contributor

hi @vineetchaure,

the Cloudera Manager itself manages the service startup scripts.

we do not recommend modifying the Cloudera service startup scripts, as this will harm and interfere with correct startup and consequently generate unwanted problems.