Created on 06-16-2016 05:05 PM - edited 09-16-2022 03:25 AM
Are there plans to support elasticity and cluster auto scaling? Services like Databricks, Qbole, and solutions like Cloudbreak are already doing this. Cloudera seems to be stuck in manual provisioning and scaling, and I would like to see the ability for each user to be able to provision their own cluster, and have it automatically scale to accommodate Spark jobs.
Created 06-16-2016 05:35 PM
Hi,
Regarding elasticity, you can grow and shrink clusters using Director. Auto-scaling is not an out-of-the-box feature; however, there are users who have used Director's API to automate scaling of the cluster based on some parameters. In any case, we have this on the roadmap and we will take your input into prioritization. If you can provide more details on your usecase, that would be welcome too.
Thanks,
Vinithra
Created 06-16-2016 05:35 PM
Hi,
Regarding elasticity, you can grow and shrink clusters using Director. Auto-scaling is not an out-of-the-box feature; however, there are users who have used Director's API to automate scaling of the cluster based on some parameters. In any case, we have this on the roadmap and we will take your input into prioritization. If you can provide more details on your usecase, that would be welcome too.
Thanks,
Vinithra
Created on 09-14-2016 04:35 AM - edited 09-14-2016 05:01 AM
Here's a use case: I have a cluster set up running Cloudera Express, using hdfs, zookeeper, solr, and accumulo to store and index data. Once a month I have a huge surge in my data ingestion and indexing load, and need to double the size of the solr cluster for 5 days to speed up indexing for this surge, and then bring it back down to the previous cluster size when ingestion and indexing is done after those 5 days.
How can I do this?
Created 09-15-2016 05:14 PM
Hi,
I'm assuming you used Cloudera Director to set this cluster up.
To grow the cluster based on load, you first need to identify how you can programmatically assess that load has/is going to increased. Examples are: day of month triggered by a cron job, or the output of some monitoring that tells you that the volume of data that needs to be ingested has crossed a threshold.
With that indicator that load is increasing as input, write a script that makes a PUT request to the update endpoint on clusters, with a body that describes the cluster with the increased count of instances:
PUT /api/v5/environments/{environment}/deployments/{deployment}/clusters/{cluster}
You can try out this API by going to the API console in your Director UI. You can find a link to the API console in the upper right ?.
Is this something that you can start with? Were you expecting a different way of achieving this?
-Vinithra
Created 09-15-2016 05:23 PM
Neglected to mention this is all running in AWS, sorry, and no Director, just using Cloudera Express.
Ingest load is just determined by queue depth, so when I get that huge influx of data once a month I can set up the cluster to autoscale easy enough. But that does me no good to scale the cluster without being able to scale the services too...specifically I think I just need to scale out solr to handle the indexing load.
Created 09-16-2016 05:47 PM
Hi,
Without Cloudera Director, this is going to be hard. You can try using the CM API to add hosts to the service and grow the service by figuring out what CM does for Solr. http://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_api.html
This will be much easier if you are using Director, as it does the work for you of growing the service by adding the hosts, registering them with CM, etc.
Created 09-19-2016 05:41 AM
Vinithra,
Thanks, thats pretty much what I thought, but wanted to make sure!
Berg