I have installed and setup Kafka (KAFKA-3.1.1-188.8.131.52.p0.2) in Cloudera Manager (Cloudera Enterprise 5.14.3) successfully. I have also configured and setup a Splunk connector to allow Splunk to consume Cloudera Audit data.
However, I have to manually launch the connect-distributed.sh script and register the Splunk Sink connector if something fails. If the server is restarted I would have log into the server and manually run the 2 commands (curl) to get the distributed service (or maybe I should call it a role) running and to register it with the Splunk service.
Is there a way to run scripts automatically when Cloudera Manager is used to restart Kafka?
If not, I'm thinking I will create a Python based framework that runs in cron and checks the health of the connect-distributed.sh service and re-run it if it is down.
Now the question is, given that HortonWorks has their DataFlow products, in which Cloudera now owns, and the ability to manage schemas in Kafka, does the Schema Manager manage the connect_distributed service or is it even needed with CDF now?