Created on 08-31-202002:25 PM - edited on 09-03-202012:06 AM by VidyaSargur
Cloudera Data Warehouse (CDW) brings one of the most efficient compute isolation and rapid scale up and down of data warehousing workloads, leveraging the latest container and caching technologies.
One of the great features of this architecture is the ability to only bring compute on-demand, as illustrated by the figure below:
This default setup is the most cost-effective setup as only a few shared services nodes (small nodes running services like UIs, Viz, Zookeeper, etc.) are long lasting. Each Virtual Warehouse has a set of nodes that are only running if compute is needed (i.e. a new query on a non-cached dataset).
The caveat to this approach is that on a completely cold warehouse, the warm up type from zero to compute is 1 minute or 2.
An alternative to this default architecture is to leverage compute-reserved nodes, that will be shared between virtual warehouses, as depicted below:
With this architecture, a pool of reserved nodes can be used to enable the immediate availability of compute across nodes. In this article, I will showcase how to set up reserved instances in CDW.
Note: This article is a high-level tutorial. It is not my intent to detail the behavior of how reserved nodes are shared across warehouses, or recommend generic sizing. The number of instances and the VW behavior will depend on your implementation.
Step 1: Get your Kubeconfig
In CDW, go to your environment, click on the 3 dots on the environment box > Show Kubeconfig:
Grant your ARN access to the environment, and copy/download the kubeconfig (see this article for more details).