Created on 05-02-2024 01:01 PM - edited 05-02-2024 01:19 PM
Apache HBase provides different cache capabilities. In Cloudera Operational Database (COD) deployments using cloud storage, we leverage the file-based BucketCache implementation to deploy it over ephemeral SSD volumes to cache the whole user dataset to avoid client reads from having to reach the slower cloud storage layer. However, the file-based BucketCache implementation was originally volatile, meaning the cache was not retained upon region server restarts. This requires a warm-up period every time a region server is restarted, to maintain optimal read performance. Depending on the data size and the configured cache size, this warm-up can take anywhere from a few minutes to a few hours. To eliminate this, the Cloudera team and the HBase community implemented the BucketCache persistence feature (HBASE-27264/HBASE-27486), where the region servers periodically persist the blocks cached in the bucket cache. This persisted information is then used to recover the cache in the event of a region server restart (either normal restart or crash).
Caching is critical for the performance of COD clusters using cloud storage, but making sure client request load is evenly distributed among region servers is equally important, so we needed to make the built-in HBase Balancer "aware" of the cache usage when deciding to move regions around. In the current HBase's default balancer, the Stochastic Load Balancer, the caching state is not considered. This can cause regions already "fully" cached on a region server to be moved to another region server. The cached data for this region now has to be thrown away and it has to be cached afresh on the newly assigned region server.
Meet the new CacheAwareLoadBalancer (HBASE-27389), which is designed to consider the cache allocation of each region on region servers when calculating a new assignment plan and use the region/region server cache allocation information reported by region servers to calculate the percentage of data cached for each region on the region server, and then use that as a factor when deciding on an optimal, new assignment plan.
HBase master captures the cache information from all the region servers and uses this information to decide the region assignments while ensuring a minimal impact on the warmed-up cache. A region is assigned to the region server where it has a better cache ratio as compared to the region server where it is currently hosted.
The CacheAwareLoadBalancer uses two cost elements for deciding the region allocation. These are described below:
The cache cost is calculated as the percentage of data for a region cached on the region server where it is either currently hosted or was previously hosted. A region may have multiple HFiles, each of different sizes. An HFile is considered to be fully cached when all the data blocks in this file are in the cache. The region server hosting this region calculates the ratio of the size of HFiles cached in the bucket cache to the total size of HFiles in the region. This ratio will vary from 0 (region hosted on this server, but none of its HFiles are cached into the bucket cache) to 1 (region hosted on this server and all the HFiles for this region are cached into the bucket cache).
Every region server maintains this information for all the regions currently hosted there. In addition to that, this cache ratio is also maintained for the regions that were previously hosted on this region server giving historical information about the regions as long as the blocks weren’t yet evicted.
The skewness cost is calculated as the number of regions hosted on each region server in the cluster. The skewness cost varies from 0 (regions are equally distributed across the region servers) to 1 (regions are not equally distributed across the region servers).
The balancer considers these two costs and calculates the resulting cost of maintaining the distribution of the current regions in the cluster. The balancer will attempt to rebalance the cluster under the following conditions:
The cluster can be made to use the CacheAwareLoadBalancer by setting the following configuration properties:
Defines the load balancer class to be used in the cluster. The default load balancer used by the cluster is StochasticLoadBalancer. The following configuration parameter needs to be set for the cluster to use the CacheAwareLoadBalancer.
<property>
<name>hbase.master.loadbalancer.class</name>
<value>org.apache.hadoop.hbase.master.balancer.CacheAwareLoadBalancer</value>
</property>
This configuration defines the location of the file where the region servers will persist the cache index information. If this configuration is set, the region servers periodically write the cache index into the given file in the local path specified. While restarting the region server, this information is reinstated by the region server. The CacheAwareLoadBalancer relies on this information to decide on the region assignment. The CacheAwareLoadBalancer will not work in the absence of this configuration.
<property>
<name>hbase.bucketcache.persistent.path</name>
<value>/path/to/cache-index-file</value>
</property>
The CacheAwareLoadBalancer attempts to calculate the region assignment plan considering the number of regions already in the cache, reducing the need for re-caching the regions. This results in reduced IO to the underlying cloud storage for reading the blocks during region movement. To achieve this, the balancer attempts to assign the region to a host where it has more data in the cache.
We conducted a few experiments to monitor the region movement and its impact on the cache with this load balancer and compared them with the default StochasticLoadBalancer. The results of these experiments are summarized below.
TestTable, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'FILE'}}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'info0', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false',
COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536',
METADATA => {'IN_MEMORY_COMPACTION' => 'NONE'}}
We conducted the following experiments to compare the effect on cache using CacheAwareLoadBalancer and StochasticLoadBalancer. The following sections will provide a comparative analysis of the impact on cache performance when the cluster is using the CacheAwareLoadBalancer and when the cluster is using the default load balancer viz. StochasticLoadBalancer.
In all these experiments, the cache was 100% warmed up before starting the experiments.
In this experiment, a single region server in the cluster was restarted to observe how it impacts the cache when the region server restarts and the balancer runs.
The following charts show the impact on the cache while using the StochasticLoadBalancer vs CacheAwareLoadBalancer. The first chart shows the impact on the cache while using the StochasticLoadBalancer while the second shows the impact on the cache while using the CacheAwareLoadBalancer.
The results clearly show that there is minimal impact on the cache when a single region server is restarted.
The CacheAwareLoadBalancer also attempts to reassign the region back to the region server where it was hosted before it was restarted to make use of the blocks already cached for that region on that region server. This region transition is shown in the following charts.
The following snippet shows the region assignments before the region server named worker6 was restarted (The highlighted region below is hosted on the region server worker6):
Just after the region server was restarted, the region was immediately moved to worker3 as highlighted in the chart below:
After the region server restarts, it starts sending the cache information to the master. The CacheAwareLoadBalancer running on the master uses this information to find out if the cluster needs to be balanced and generates a plan to move the regions around while taking into consideration the blocks already cached on the region servers and hence attempts to reassign the region back to the old server where it was hosted earlier. The chart below shows the region assignment after the balancer is run.
Here we can see that the region is assigned back to worker6 after the balancer is run.
In this experiment, multiple region servers in the cluster were restarted simultaneously to observe the impact on the cache after the balancer run finished.
The following charts show the impact on the cache when the multiple region servers were restarted. The charts below show the impact on the cache when the cluster used the StochasticLoadBalancer and when the cluster used the CacheAwareLoadBalancer.
Cluster restart
This experiment was conducted to observe the impact on the cache when the whole cluster is restarted with the cache fully warmed up.
The chart on the left shows the impact on the cache when the cluster is running with the StochasticLoadBalancer while the chart on the right shows the impact on the cache when the cluster is running with the CacheAwareLoadBalancer:
The above charts show that the cluster did not have any impact on the cache when the whole cluster was restarted and the cluster immediately had all the cache restored to the state prior to the cluster restart.
In this test, the cache in the cluster was fully warmed up before a rolling restart operation was performed. The observations made during the cluster restart operation are summarized below:
The chart on the left shows the impact on the cache on a cluster running StochasticLoadBalancer while the chart on the right shows the impact on the cache on a cluster running CacheAwareLoadBalancer:
The chart above shows that there was a minimal impact on the cache when the rolling restart operation was performed on the cluster running the CacheAwareLoadBalancer.
The CacheAwareLoadBalancer helps HBase to retain the cached data reducing the need to read it back from the underlying high latency cloud storage, whilst keeping regions distribution even among the cluster. Without this feature, read performance for COD with cloud storage can be compromised considerably, as the movement of regions could result in more cache misses and higher latency for client reads. Disabling the balancer altogether could help avoid such impacts to the cache, but that increases operations complexity, requiring constant monitoring and manual region movement to avoid RegionServer hotspotting.