Community Articles

asinghal · ‎09-10-2024

Introduction

Cloudera Operational Database is a service that runs on the Cloudera Data Platform (CDP). Cloudera Operational Database enables you to create a new operational database that automatically scales based on your workload.

In this blog, we’ll discuss a new autoscaling mechanism in Cloudera Operational Database that is designed to address the computing needs of workloads dynamically in the fastest possible manner. This new mechanism is most beneficial for HDFS storage-based clusters, but is also useful for non-ephemeral cache-based clusters using cloud storage.

Note: This type of auto-scaling is not supported for clusters using large SSD-based cache with cloud storage.

Background

Autoscaling is an important aspect of cloud computing that helps organizations manage their computing resources effectively and ensures cost efficiency. It enables organizations to deal with unpredictable workloads and provides the advantages of improved resource utilization and cost-effectiveness.

In Cloudera Operational Database, the worker instances of the cluster host the region servers of HBase as well as HDFS data nodes. The earlier implementation of autoscaling in Cloudera Operational Database automatically scaled the worker instance group, which includes the data nodes. The commissioning and the decommissioning of these worker nodes takes time as it involves creating a new virtual machine (VM) and commissioning it to the cluster.

The new auto-scaling mechanism addresses this problem of delay by having VMs ready in a suspended state allowing them to be resumed within minutes to join the cluster. Similarly, the mechanism also improves the scale-down duration by just stopping the virtual machines and keeping them in a state, without actually decommissioning them.

The scaling decision was based on metrics collected by Cloudera Manager (CM) to determine the system's load state. These metrics can be broadly categorized into compute load metrics and data load metrics, which includes:

The combined RPC latencies of various operations for compute loads, such as scan latency, get latency, queue times, and others

HDFS usage (data load)
Overall region density on the region servers (data load)
Overall buffer cache utilization on the region servers (data load)

Relying on RPC latencies to deduce higher compute loads on the system can be tricky due to factors like network delays or IO delays which can result in higher RPC latencies. Therefore, we need metrics that can check the CPU constraint on the system and add the necessary resources.

Prerequisites for the new mechanism

The new mechanism is only applicable for the database clusters of the following types:

HDFS and non-ephemeral cached enabled clusters: This feature is enabled for the clusters with HDFS storage and for clusters with cloud storage where the ephemeral cache is not enabled.
Lightweight and heavyweight clusters: This feature is not available for MICRO-scaled clusters.
Cloudera Operational Database version 1.37.0 or later: This feature is enabled only for those newly created databases on Cloudera Operational Database version 1.37.0 or later. If a database is created on an earlier version of Cloudera Operational Database then this feature will not be not available after the upgrade to the newer version.
Contact Cloudera Support to enable the feature.

The new autoscaling mechanism

The mechanism of auto-scaling has been revamped to improve the autoscaling execution time and also to marginally reduce the TCO of the overall system.

The following aspects of the mechanism help in improving the overall performance of autoscaling.

The new “Compute” instance group

A new instance group named “compute” has been introduced in the cluster along with the worker instance group in Cloudera Operational Database. This group consists of nodes in the system that are similar to the worker nodes in the cluster. However, these nodes do not host the data services (HDFS data nodes). These nodes only provide additional computing resources to the cluster. This avoids the addition of storage to the cluster when the requirement is only for additional compute resources and not for the data. This instance group is scaled up whenever there is a higher compute requirement in the cluster.

These compute nodes use the data nodes on the worker instances to store the data. The diagram below shows the layout of Cloudera Operational Database with compute nodes in the cluster.
Screenshot 2024-09-10 at 11.00.56 PM.png

The worker instance group is scaled only when there is a higher data storage requirement.

Separate metrics for compute and data requirements

The following metrics are used to scale the “compute” and “worker” instance groups individually.

Compute Metrics

These metrics are used to scale the compute instance group in the cluster.

CPU_UTILIZATION: This metric represents the average CPU utilization on all the nodes in the cluster.
Combined metrics of RPC latencies: This metric, earlier used to scale the worker instance group, is now used to scale the compute instance group since it represents the compute load on the cluster.

Data Metrics

These metrics are used to scale the worker instance group in the cluster.

HDFS storage across all nodes in the cluster.
Region density across the region servers in the cluster.
Buffer cache usage across all the nodes in the cluster

This clear segregation of the compute and data metrics helps us in scaling the appropriate instance groups in the cluster. Particularly, if the data storage is not unnecessarily scaled when the compute metrics threshold is breached, thereby reducing the additional costs associated with the storage.

Fast Autoscaling with SUSPEND-RESUME Mechanism

CloudBreak, the cloud infrastructure tool, provides a mechanism for fast autoscaling. This mechanism involves creating the maximum number of nodes in the instance group. The required number of nodes is kept in the running state. The remaining unused nodes are suspended. The purpose, here, is to initialize the node with all the required resources and services and keep it in the suspended state. When required, these nodes are started. The duration of resuming the node is much shorter than the node creation and initialization time. Hence, the overall time of scale-up execution is reduced. When the requirement of the additional nodes goes down, the nodes are again suspended.

This mechanism is used for scaling the “compute” instance group. The worker instance group uses the legacy mechanism of node creation and deletion as per the requirement.

Feature usage

The usage of compute nodes is not enabled by default on the cluster. The “compute” instance group is created but may be empty (without any compute instances). The compute nodes can be enabled on the database cluster during the database creation (CDP CLI create-database command) or with the database update (CDP CLI update-database command) by providing the following auto-scaling parameters to the command.

--auto-scaling-parameters '{"maxCpuUtilization": <percent_cpu_utilization_threshold>, "minComputeNodesForDatabase":<minimum_number_of_compute_nodes>, "maxComputeNodesForDatabase":<maximum_number_of_compute_nodes>}'

where maximum_number_of_compute_nodes >= minimum_number_of_compute_nodes and 0 <= percent_cpu_utilization_threshold < 100

Feature benefits

This new mechanism of autoscaling helps in reducing the cluster node instantiation time from 10+ minutes to below three minutes when the autoscaling is triggered because of higher compute requirements.
It also helps in the reduction of TCO during the scale-up for higher computing requirements, as the compute nodes have minimum attached storage and do not lead to unnecessary increases in storage even for higher compute requirements.

There are other stages of autoscaling like region redistribution, which additionally needs to be executed before the newly added instances become effectively available. The marginal increase in the cost is minimal because when the nodes are not used and in the suspended state, the cost is negligible.

With the suspension and resumption of the VM instances for autoscaling, the instance ramp-up time is reduced from 10+ minutes to less than three minutes.

Performance analysis

Throughput Analysis

Performance runs using YCSB load tool confirmed that adding compute nodes to the cluster increased the throughput of the overall system. The workload F (read/read-modify-write ratio: 50/50) of YCSB load was executed on the setups by incrementally adding more load on the system by increasing the client threads. This throughput is comparable with the throughput of a cluster where the same number of worker nodes are added. The use of compute nodes alone, without local datanodes, doesn't seem to affect the throughput of the cluster when compared with the workers having local datanodes. This is because the EBS volumes used by the workers have to be accessed through the network regardless, so the additional round trip between the worker and compute nodes is found to have less impact on performance during the test.

The compute nodes provide the performance that is at par with the worker nodes at lower TCO. The latencies were also comparable between the six worker setup and the setup with three workers and three compute nodes. The following YCSB tool command was used to collect the performance numbers:

./bin/ycsb run hbase20 -p columnfamily=cf -P workloads/workloadf -p requestdistribution=zipfian -p operationcount=500000 -p recordcount=500000 -p maxexecutiontime=900 -threads <thread_count> -s

Varying the thread_count between 128 and 624.

The chart below highlights the comparisons between the throughputs on these setups.

Screenshot 2024-09-10 at 11.05.28 PM.png

The table below highlights the improvements in the instance start-up time during scale-up and scale-down events.

Events

Worker Node

Compute Node

Improvement

Scale-up

(Instance startup time)

11 minutes

3 minutes

72%

Scale-down

(Instance shutdown)

6 minutes

3 minutes

50%

About YCSB

YCSB is an open-source specification and program suite for evaluating the retrieval and maintenance capabilities of computer programs. It is a very popular tool used to compare the relative performance of NoSQL database management systems.

To use YCSB to test the performance of the operational database, check out the blog How to Run YCSB for HBase.

Next Steps

In today’s world, during high loads, time is a crucial factor in making the resources available to the cluster to avoid negative impacts on the overall system. The new implementation of auto-scaling is an important step in this direction that helps reduce the amount of time taken for the instances to be available to the cluster in a cost-effective way. Visit the product page to find out more about Cloudera Operational Database.

Cloudera Community