Community Articles
Find and share helpful community-sourced technical articles.
Cloudera Employee

Many customers have a high number of CDSW Models they wish to deploy in their environments.  Some customers have a large number of model requests coming in which would exceed the default 30 second timeout limit of these models.

According the CDSW Documentation, Model Replicas are described as "The engines that serve incoming requests to the model."  Models are single threaded and can only process one request at a time


Replicas are utilized for models to ensure some level of load-balancing, fault tolerance, and serving multiple requests.  There is a maximum deployment of 9 replicas per model.


This UI limit within the model can be circumvented by scaling the model manually through Kubernetes commands.


NOTE:  Please perform these at your own risk.


One can attempt the following to scale up their model deployment.


Find model deployments.

- kubectl get deployments -all-namespaces


Scale select deployment.

- kubectl scale deployments sample-model --replicas=10


Running `kubectl scale` will terminate the existing pod and re deploy them with an additional number of containers within that pod.  The final result will look like this.


default sample-model 10/10 Running 0 23m