Created on 05-02-2024 10:54 PM - edited on 05-03-2024 01:56 AM by VidyaSargur
Cloudera Operational Database (COD) is a service that runs on the Cloudera Data Platform (CDP). COD enables you to create a new operational database that automatically scales based on your workload. To deploy high-performance applications at scale, a rugged operational database is essential. COD is a high-performance and highly scalable operational database designed for powering, at any scale, the biggest data applications on the planet. Powered by Apache HBase and Apache Phoenix, COD ships out of the box with Cloudera Data Platform (CDP) in the public cloud. It’s also ready for hybrid and multi-cloud deployments to meet your business where it is today, whether AWS, Microsoft Azure, or GCP.
Support for cloud storage is an important capability of COD that, in addition to the pre-existing support for HDFS on local storage, offers customers a choice of price-performance characteristics. Please refer to the blog for more information on the performance differences between COD on HDFS and COD on cloud storage with ephemeral cache (Amazon AWS and Microsoft Azure).
To understand how COD delivers the best cost-efficient performance for your applications, let’s dive into benchmarking results comparing COD using different cloud storages.
The tests were performed on a data set created using the Yahoo! Cloud Serving Benchmark (YCSB) test framework on AWS. YCSB is an open-source benchmarking suite for performance evaluations. It is frequently used to measure the performance of multi-node database systems on the public cloud and other distributed infrastructure.
In this performance evaluation, a large dataset of 20TB was generated and backed up to an S3 bucket for further use. The same data was in turn exported to run the performance tests on Azure and GCP for fair comparison.
This article measures the performance differences between Amazon AWS, Microsoft Azure, and Google GCP with ephemeral cache. It does not evaluate the performance of cloud storage, local disks, and block storage independently.
The details of the dataset used for these performance tests are as follows:
The tests were run using the YCSB tool. The details are given below:
The charts below show the comparison between AWS, Azure, and GCP with 100% ephemeral cache warm-up. This ensures that most of the blocks are in the cache.
The charts below show the time taken to warm up the cache on COD on Amazon AWS and COD on GCP. It has been observed that COD on AWS takes 2x time to warm up the cache as compared to the warm up time required in GCP.
The following chart shows the comparison between some key performance indicators on AWS, Azure, and GCP cloud platforms:
The following chart shows the average throughput observed while running the YCSB tests. It has been observed that the average throughput of HBase running on Google GCS is better than the throughput observed on HBase with Amazon AWS and Microsoft Azure in different types of workloads. Hence, HBase with Google GCS gives a better overall performance over other cloud providers.
The following chart shows the latency observed while running the workloads involving reads.
The results show that HBase with Google GCS has better latencies as compared to Amazon AWS and Microsoft Azure in the case of read-only workload viz. workload-c, while they are comparable in a mixed workload like workload-a.
The following chart shows the latency observed while running workloads involving writes.
The results show that the write latency observed while running HBase with Google GCS is better than the HBase with Amazon AWS and Microsoft Azure by a large margin.
The above comparison shows that GCP with GCS is found to be performing better as compared to Amazon AWS and Microsoft Azure with better overall throughput and better read/write latencies while running the workloads. The write latencies for GCP with GCS were found to be way better than the other two platforms, which is owing to the performance of the block storage in GCS.
A similar performance experiment was performed to compare the performance of COD running on HDFS vs. COD running on cloud storage provided by Amazon AWS and Microsoft Azure. The details of these experiments can be found in the blog titled Cloudera Operational Database (COD) Performance Benchmarking: Comparing HDFS and Cloud Storage.
A detailed description of how to run YCSB for HBase can be found in the blog titled How to run YCSB for HBase.
Visit the product page to learn more about the Cloudera Operational Database or reach out to your account team.