- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Run custom script before service stop?
- Labels:
-
Cloudera Manager
Created ‎07-11-2018 09:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
Not sure if this is the right place but...
We use Streamsets to load data into a series of databases within our HDFS cluster. However, each time the cluster is restarted, the pipelines all drop into "START_ERROR" state when Streamsets starts - I assume because it's trying to start multiple pipelines on a single Streamsets host at the same time.
Is there a way of getting Cloudera to run a script before it stops the Streamsets service? We have the script already as we use it to stop the pipelines ahead of doing any batch processing on the data. Currently we have a manual process to run the script (just a series of curl calls into the Streamsets API)
We are running CDH 5.9.0 with Cloudera Manager 5.9 currently.
Any advice would be gratefully received.
Thanks
Ben
Created ‎07-11-2018 11:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on the information provided, I believe the information you seek would best be provided by StreamSets as they build the parcel and CSD. When you restart, Cloudera Manager will signal the StreamSets service to restart, but the handling of that action is done at the CSD level which is created by the vendor (not Cloudera).
I think this is where they handle their questions:
https://groups.google.com/a/streamsets.com/forum/#!forum/sdc-user
Created ‎07-11-2018 11:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on the information provided, I believe the information you seek would best be provided by StreamSets as they build the parcel and CSD. When you restart, Cloudera Manager will signal the StreamSets service to restart, but the handling of that action is done at the CSD level which is created by the vendor (not Cloudera).
I think this is where they handle their questions:
https://groups.google.com/a/streamsets.com/forum/#!forum/sdc-user
Created ‎07-11-2018 11:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, the most promising fix is in Streamsets 3.0.0.0. Given we're running CDH 5.9, we on version 2.3.0.0 currently.
Do you know if version 3.0.0.0 is in a later version of CDH?
The real reason for the failures appears to be that Streamsets is starting before the HDFS / Hive services that we're using as data targets for the pipelines. If we could change the order of service startup somewhere that would probably help...
Created ‎07-12-2018 08:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cloudera Manager's steps for starting servers are based on internal dependencies, so there isn't a configuration in Cloudera Manager that could be used to change it. I suspect that the StreamSets fix may have been around dependencies, in the CSD, but that's just a guess.
The only alternative I can see at this time is to bring the services up one at a time.
You could do that with the API and wrap it in the script:
You can see all REST API stuff here: https://cloudera.github.io/cm_api/apidocs/v14/
For instance, you could write a shell script that executes a serice of "curl" commands that start the services.
There is a complexity in that you need to wait till the service is really running before started, but that can be accomplished by using:
and parsing out the status. If a service is "STARTED" start the next one...
"serviceState" : "STARTED",
It might be something to play with if you are stuck in this state long term. The API is in Java and Python and you can find examples of usage here too:
https://cloudera.github.io/cm_api/
