Support Questions

BMac · ‎06-16-2016

Are there plans to support elasticity and cluster auto scaling? Services like Databricks, Qbole, and solutions like Cloudbreak are already doing this. Cloudera seems to be stuck in manual provisioning and scaling, and I would like to see the ability for each user to be able to provision their own cluster, and have it automatically scale to accommodate Spark jobs.

Vinithra · ‎06-16-2016

Hi,

Regarding elasticity, you can grow and shrink clusters using Director. Auto-scaling is not an out-of-the-box feature; however, there are users who have used Director's API to automate scaling of the cluster based on some parameters. In any case, we have this on the roadmap and we will take your input into prioritization. If you can provide more details on your usecase, that would be welcome too.

Thanks,

Vinithra

View solution in original post

Vinithra · ‎06-16-2016

Hi,

Regarding elasticity, you can grow and shrink clusters using Director. Auto-scaling is not an out-of-the-box feature; however, there are users who have used Director's API to automate scaling of the cluster based on some parameters. In any case, we have this on the roadmap and we will take your input into prioritization. If you can provide more details on your usecase, that would be welcome too.

Thanks,

Vinithra

atkinsonb2 · ‎09-14-2016

Here's a use case: I have a cluster set up running Cloudera Express, using hdfs, zookeeper, solr, and accumulo to store and index data. Once a month I have a huge surge in my data ingestion and indexing load, and need to double the size of the solr cluster for 5 days to speed up indexing for this surge, and then bring it back down to the previous cluster size when ingestion and indexing is done after those 5 days.

How can I do this?

Vinithra · ‎09-15-2016

Hi,

I'm assuming you used Cloudera Director to set this cluster up.

To grow the cluster based on load, you first need to identify how you can programmatically assess that load has/is going to increased. Examples are: day of month triggered by a cron job, or the output of some monitoring that tells you that the volume of data that needs to be ingested has crossed a threshold.

With that indicator that load is increasing as input, write a script that makes a PUT request to the update endpoint on clusters, with a body that describes the cluster with the increased count of instances:

PUT /api/v5/environments/{environment}/deployments/{deployment}/clusters/{cluster}

You can try out this API by going to the API console in your Director UI. You can find a link to the API console in the upper right ?.

Is this something that you can start with? Were you expecting a different way of achieving this?

-Vinithra

atkinsonb2 · ‎09-15-2016

Neglected to mention this is all running in AWS, sorry, and no Director, just using Cloudera Express.

Ingest load is just determined by queue depth, so when I get that huge influx of data once a month I can set up the cluster to autoscale easy enough. But that does me no good to scale the cluster without being able to scale the services too...specifically I think I just need to scale out solr to handle the indexing load.

Vinithra · ‎09-16-2016

Hi,

Without Cloudera Director, this is going to be hard. You can try using the CM API to add hosts to the service and grow the service by figuring out what CM does for Solr. http://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_api.html

This will be much easier if you are using Director, as it does the work for you of growing the service by adding the hosts, registering them with CM, etc.

atkinsonb2 · ‎09-19-2016

Vinithra,

Thanks, thats pretty much what I thought, but wanted to make sure!

Berg

Cloudera Community

Support Questions

Support for auto scaling

Faster Auto-scaling for Higher Computing Requireme...

Spark Python Supportability Matrix

Spark and Java versions Supportability Matrix

Scaling the HDFS NameNode (part 5)

Apache Spark and Iceberg Supportability Matrix

Spark Auto-Scaling with Kubernetes in Cloudera Dat...

Uploading Files for Cloudera Support - alternate m...

Scaling the HDFS NameNode (part 1)

Scaling the HDFS NameNode (part 2)

Scaling and Auto-scaling of HDP on AWS and Azure C...