Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Run custom script before service stop?

avatar
New Contributor

Hello!

 

Not sure if this is the right place but...

 

We use Streamsets to load data into a series of databases within our HDFS cluster. However, each time the cluster is restarted, the pipelines all drop into "START_ERROR" state when Streamsets starts - I assume because it's trying to start multiple pipelines on a single Streamsets host at the same time.

 

Is there a way of getting Cloudera to run a script before it stops the Streamsets service? We have the script already as we use it to stop the pipelines ahead of doing any batch processing on the data. Currently we have a manual process to run the script (just a series of curl calls into the Streamsets API)

We are running CDH 5.9.0 with Cloudera Manager 5.9 currently. 

 

Any advice would be gratefully received.

 

Thanks

 

Ben

1 ACCEPTED SOLUTION

avatar
Master Guru

@BenK,

 

Based on the information provided, I believe the information you seek would best be provided by StreamSets as they build the parcel and CSD.  When you restart, Cloudera Manager will signal the StreamSets service to restart, but the handling of that action is done at the CSD level which is created by the vendor (not Cloudera).

 

I think this is where they handle their questions:

 

https://groups.google.com/a/streamsets.com/forum/#!forum/sdc-user

 

 

View solution in original post

3 REPLIES 3

avatar
Master Guru

@BenK,

 

Based on the information provided, I believe the information you seek would best be provided by StreamSets as they build the parcel and CSD.  When you restart, Cloudera Manager will signal the StreamSets service to restart, but the handling of that action is done at the CSD level which is created by the vendor (not Cloudera).

 

I think this is where they handle their questions:

 

https://groups.google.com/a/streamsets.com/forum/#!forum/sdc-user

 

 

avatar
New Contributor
Thanks - I've asked on Streamsets and they've pointed me at a couple of items that have been implemented that will help (preventing the pipelines all starting at service start, for example).
Unfortunately, the most promising fix is in Streamsets 3.0.0.0. Given we're running CDH 5.9, we on version 2.3.0.0 currently.
Do you know if version 3.0.0.0 is in a later version of CDH?

The real reason for the failures appears to be that Streamsets is starting before the HDFS / Hive services that we're using as data targets for the pipelines. If we could change the order of service startup somewhere that would probably help...

avatar
Master Guru

@BenK,

 

Cloudera Manager's steps for starting servers are based on internal dependencies, so there isn't a configuration in Cloudera Manager that could be used to change it.  I suspect that the StreamSets fix may have been around dependencies, in the CSD, but that's just a guess.

 

The only alternative I can see at this time is to bring the services up one at a time.

You could do that with the API and wrap it in the script:

 

https://cloudera.github.io/cm_api/apidocs/v14/path__clusters_-clusterName-_services_-serviceName-_co...

 

You can see all REST API stuff here: https://cloudera.github.io/cm_api/apidocs/v14/

 

For instance, you could write a shell script that executes a serice of "curl" commands that start the services.

There is a complexity in that you need to wait till the service is really running before started, but that can be accomplished by using:

 

https://cloudera.github.io/cm_api/apidocs/v14/path__clusters_-clusterName-_services_-serviceName-.ht...

 

and parsing out the status.  If a service is "STARTED" start the next one...

 

 "serviceState" : "STARTED",

 

It might be something to play with if you are stuck in this state long term.  The API is in Java and Python and you can find examples of usage here too:

https://cloudera.github.io/cm_api/