I have provisioned a Azure HDInsight Hadoop [3.6] and Spark Cluster [2.1] cluster and did some configuration changes to add some custom attributes [core-site.xml, mapred-site.xml, spark-env.sh, spark-defaults.sh]
After doing those changes I am restarting those services in the following order:
stop_service() {
if [ -z "$1" ];
then echo "[`date`] [${USER}] Need service name to stop service"
exit 1
fi
SERVICENAME=$1
echo "[`date`] [${USER}] Stopping $SERVICENAME"
if [[ $SERVICENAME =~ SPARK.* ]]; then
curl -u $USERID:$PASSWD -sS -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"_PARSE_.STOP.$SERVICENAME","operation_level":{"level":"SERVICE","cluster_name":"CLUSTERNAME","service_name":"$SERVICENAME"}},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}' "https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/$SERVICENAME"
else
curl -u $USERID:$PASSWD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop $SERVICENAME via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://$ACTIVEAMBARIHOST:$PORT/api/v1/clusters/$CLUSTERNAME/services/$SERVICENAME
fi
sleep 10
}
start_service() {
if [ -z "$1" ]; then
echo "[`date`] [${USER}] Need service name to start service"
exit 1
fi
sleep 10
SERVICENAME=$1
echo "[`date`] [${USER}] Starting $SERVICENAME"
if [[ $SERVICENAME =~ SPARK.* ]]; then
startResult=$(curl -u $USERID:$PASSWD -sS -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"_PARSE_.STOP.$SERVICENAME","operation_level":{"level":"SERVICE","cluster_name":"CLUSTERNAME","service_name":"$SERVICENAME"}},"Body":{"ServiceInfo":{"state":"STARTED"}}}' "https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/$SERVICENAME")
else
startResult=$(curl -u $USERID:$PASSWD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICENAME via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$ACTIVEAMBARIHOST:$PORT/api/v1/clusters/$CLUSTERNAME/services/$SERVICENAME)
fi
if ([[ $startResult == *"500 Server Error"* ]] || [[ $startResult == *"400 Bad Request"* ]]) || [[ $startResult == *"internal system exception occurred"* ]]; then
sleep 60
echo "[`date`] [${USER}] Retry starting $SERVICENAME"
if [[ $SERVICENAME =~ SPARK.* ]]; then
startResult=$(curl -u $USERID:$PASSWD -sS -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"_PARSE_.STOP.$SERVICENAME","operation_level":{"level":"SERVICE","cluster_name":"CLUSTERNAME","service_name":"$SERVICENAME"}},"Body":{"ServiceInfo":{"state":"STARTED"}}}' "https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/$SERVICENAME")
else
startResult=$(curl -u $USERID:$PASSWD -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start $SERVICENAME via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://$ACTIVEAMBARIHOST:$PORT/api/v1/clusters/$CLUSTERNAME/services/$SERVICENAME)
fi
fi
echo "$startResult"
}
Note: I have to use "RequestInfo":{"context":"_PARSE_.STOP.$SERVICENAME" .... for SPARK2 service to stop otherwise Spark Thrift server still shows in stale state and need to restart.
[Spark maintenance mode is On]
stop_service SPARK2
stop_service OOZIE
stop_service HIVE
stop_service MAPREDUCE2
stop_service YARN
stop_service HDFS
start_service HDFS
start_service YARN
start_service MAPREDUCE2
start_service HIVE
start_service OOZIE
start_service SPARK2
[Spark maintenance mode is Off]
After doing all this restart surprisingly all the services came up successfully but sometimes services goes to unknown state and I can see yellow question mark in Ambari UI. [stop and start service both returns 200 in place of 202 (accepted)]
But if I run the same script again to provision another cluster with same type it works.
Why this kind of inconsistency there and what is the best way to restart a service if I did some configuration changes?