Created on 11-04-201901:16 PM - edited on 11-04-201903:58 PM by Robert Justice
Many customers have a high number of CDSW Models they wish to deploy in their environments. Some customers have a large number of model requests coming in which would exceed the default 30 second timeout limit of these models.
According the CDSW Documentation, Model Replicas are described as "The engines that serve incoming requests to the model." Models are single threaded and can only process one request at a time
Replicas are utilized for models to ensure some level of load-balancing, fault tolerance, and serving multiple requests. There is a maximum deployment of 9 replicas per model.
This UI limit within the model can be circumvented by scaling the model manually through Kubernetes commands.
NOTE: Please perform these at your own risk.
One can attempt the following to scale up their model deployment.
Running `kubectl scale` will terminate the existing pod and re deploy them with an additional number of containers within that pod. The final result will look like this.
NAMESPACE NAME READY STATUS RESTARTS AGE default sample-model 10/10 Running 0 23m