Created on
08-31-2020
02:25 PM
- edited on
09-03-2020
12:06 AM
by
VidyaSargur
Cloudera Data Warehouse (CDW) brings one of the most efficient compute isolation and rapid scale up and down of data warehousing workloads, leveraging the latest container and caching technologies.
One of the great features of this architecture is the ability to only bring compute on-demand, as illustrated by the figure below:
This default setup is the most cost-effective setup as only a few shared services nodes (small nodes running services like UIs, Viz, Zookeeper, etc.) are long lasting. Each Virtual Warehouse has a set of nodes that are only running if compute is needed (i.e. a new query on a non-cached dataset).
The caveat to this approach is that on a completely cold warehouse, the warm up type from zero to compute is 1 minute or 2.
An alternative to this default architecture is to leverage compute-reserved nodes, that will be shared between virtual warehouses, as depicted below:
$ export KUBECONFIG=[path_to_your_kubeconfig]
$ kubectl get deployments -n cluster
NAME READY UP-TO-DATE AVAILABLE AGE
ardent-ferret-efs-provisioner 2/2 2 2 4h46m
compute-reserved-node 0/0 0 0 4h46m
crusty-abalone-cluster-autoscaler 1/1 1 1 4h46m
nginx-default-backend 1/1 1 1 4h46m
nginx-service 3/3 3 3 4h46m
shared-services-reserved-node 0/0 0 0 4h46m
kubectl edit deployment compute-reserved-node -n cluster
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2020-08-31T16:28:52Z"
generation: 1
labels:
app.kubernetes.io/instance: trendy-mastiff
app.kubernetes.io/managed-by: Tiller
app.kubernetes.io/name: cluster-overprovisioner
cluster-overprovisioner-name: compute-reserved-node
helm.sh/chart: cluster-overprovisioner-0.2.5
name: compute-reserved-node
namespace: cluster
resourceVersion: "3476"
selfLink: /apis/extensions/v1beta1/namespaces/cluster/deployments/compute-reserved-node
uid: a5cb9ea1-729a-4665-9734-94c2f669984f
spec:
progressDeadlineSeconds: 600
replicas: 3
After a few minutes, you should see your configuration being applied:
$ kubectl get deployments -n cluster
NAME READY UP-TO-DATE AVAILABLE AGE
ardent-ferret-efs-provisioner 2/2 2 2 4h54m
compute-reserved-node 3/3 3 3 4h54m
crusty-abalone-cluster-autoscaler 1/1 1 1 4h54m
nginx-default-backend 1/1 1 1 4h54m
nginx-service 3/3 3 3 4h54m
shared-services-reserved-node 0/0 0 0 4h54m