Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sreamsets setup on CDH 5.9.1

avatar
Explorer

Hi,

 

We have installed and configure CDH 5.9.1 using couldera director. We are trying to deply streamsets data collector 2.2  as cluster service using percel. I could download, distrubute and activate parcel from Manager. However I tried to add service but I could not find any option to choose  streamsets data collector.  I tried streamsets version 2.1 as well but no luck.

 

Your help will be appreciated. 

 

Thanks

SP

1 ACCEPTED SOLUTION

avatar
Rising Star

I'm not sure if there's official support with Director, but it can be done using the conf file with the help of a bootstrap script:

 

Specify a bootstrap script (only for the CM instance) to download and place the csd jar in the appropriate csd directory.

 

instances {

  cminstance {
    type: m4.xlarge
    image: ami-ac5f2fcc

    tags {
      owner: ${?USER}
    }

    bootstrapScript: """#!/bin/sh
yum -y install wget
wget https://archives.streamsets.com/datacollector/2.3.0.0/csd/STREAMSETS-2.3.0.0.jar
mkdir -p /opt/cloudera/csd
mv STREAMSETS-2.3.0.0.jar /opt/cloudera/csd/
"""
  }

  ...
}

You also have to specify the Product, Service, Role name along with the Parcel Repository URL in the conf file. The following worked for me (I went through a manual install to get these values):

 

cluster {

  # add the streamset data collector product
  products {
    CDH: 5
    STREAMSETS_DATACOLLECTOR: 2.3
  }


  # add the streamset parcel repository
  parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.9/",
                       "https://archives.streamsets.com/datacollector/latest/parcel/"]

  # add the service
  services: [HDFS, YARN, STREAMSETS,...]

  ...

  workers {
  	...

    # add the data collector role to the streamset service
    roles {
      HDFS: [DATANODE]
      YARN: [NODEMANAGER]
      STREAMSETS: [DATACOLLECTOR]
      ...
    }
  }
}

 

View solution in original post

5 REPLIES 5

avatar
Rising Star

I'm not sure if there's official support with Director, but it can be done using the conf file with the help of a bootstrap script:

 

Specify a bootstrap script (only for the CM instance) to download and place the csd jar in the appropriate csd directory.

 

instances {

  cminstance {
    type: m4.xlarge
    image: ami-ac5f2fcc

    tags {
      owner: ${?USER}
    }

    bootstrapScript: """#!/bin/sh
yum -y install wget
wget https://archives.streamsets.com/datacollector/2.3.0.0/csd/STREAMSETS-2.3.0.0.jar
mkdir -p /opt/cloudera/csd
mv STREAMSETS-2.3.0.0.jar /opt/cloudera/csd/
"""
  }

  ...
}

You also have to specify the Product, Service, Role name along with the Parcel Repository URL in the conf file. The following worked for me (I went through a manual install to get these values):

 

cluster {

  # add the streamset data collector product
  products {
    CDH: 5
    STREAMSETS_DATACOLLECTOR: 2.3
  }


  # add the streamset parcel repository
  parcelRepositories: ["http://archive.cloudera.com/cdh5/parcels/5.9/",
                       "https://archives.streamsets.com/datacollector/latest/parcel/"]

  # add the service
  services: [HDFS, YARN, STREAMSETS,...]

  ...

  workers {
  	...

    # add the data collector role to the streamset service
    roles {
      HDFS: [DATANODE]
      YARN: [NODEMANAGER]
      STREAMSETS: [DATACOLLECTOR]
      ...
    }
  }
}

 

avatar
Explorer
Many Thanks AARMAN !! It worked.

avatar
Contributor

Hello @aarman,
could you give me please further assistance regarding the config-files you mentioned?
I have a similar issue like @SAKTIPADA.
I installed and distributed the StreamSets Data Collector version 3.5.0 Parcel with Cloudera Manager on our CDH 5.14.1 with success but I am not able to add the services with CM because there is no StreamSets option.

 

Thanks. 

avatar
Contributor

My fault. I forgot to add the Custom Service Descriptor (CSD) to /opt/cloudera/csd/.
After restarting scm with: service cloudera-scm-server restart I was able to find StreamSets on the AddService-Page.

avatar
Rising Star

@Baris Glad you got it working. With Director version 2.4+ you actually don't need to use a bootstrap script as shown in my previous example. You can just specify the CSD URL on the conf file and Director will automatically download and place it in '/opt/cloudera/csd/'.

 

See the documentation here: https://www.cloudera.com/documentation/director/latest/topics/director_non-cdh_products_custom_descr...