Created on 05-08-2024 03:23 AM - edited on 05-14-2024 12:27 AM by VidyaSargur
Cloudera Operational Database (COD) serves as a foundational service within the Cloudera Data Platform (CDP), enabling users to effortlessly create operational databases that dynamically scale to meet workload demands. When deploying high-performance applications at scale, a robust operational database plays a crucial role. COD addresses this need by offering a highly scalable and high-performance operational database engineered to support data-intensive applications.
Leveraging the robust foundations of Apache HBase and Apache Phoenix, COD is integrated into the Cloudera Data Platform (CDP) in the public cloud. It is designed for versatility, accommodating hybrid as well as multi-cloud deployments, ensuring adaptability across various cloud environments including Amazon AWS, Microsoft Azure, and Google GCP.
For AWS deployments, COD provides two primary storage options: S3 with Ephemeral Cache, known for its high performance albeit with slightly higher costs, and S3 without Ephemeral Cache, offering a more budget-friendly solution albeit with reduced performance capabilities. Recently, AWS introduced "Express S3", a streamlined iteration of S3 claiming a 10x increase in speed compared to the standard version. However, its current availability is limited to a single zone, resulting in diminished durability compared to regular S3. The speed of Express S3 intrigued us, leading to an exploration of its potential as a primary storage solution to achieve high performance without relying on Ephemeral Cache.
Consequently, we embarked on evaluating this new storage type, particularly for users who are comfortable with the existing durability parameters of Express S3.
In the following sections, we delve into the benchmarking results, comparing the performance of all three storage types. We provide conclusions that can guide decision-making processes for users leveraging the Cloudera Operational Database on AWS.
We use the Yahoo! Cloud Serving Benchmark (YCSB) framework for our performance testing. YCSB serves as an open-source benchmarking suite tailored for evaluating performance metrics. It is widely adopted for measuring the efficiency of database systems across multi-node setups, including those deployed on public cloud environments.
For this performance assessment, a substantial dataset comprising 20TB was generated and securely stored within an S3 bucket. This dataset remains consistent across all tests, ensuring uniformity and comparability in the evaluation process.
Key Dataset Details:
The benchmarking environment is configured within the AWS infrastructure, with the following specifications:
The benchmarking tests were conducted using the YCSB tool, with a focus on specific workloads tailored to assess the performance characteristics of the Cloudera Operational Database.
YCSB Workloads Employed:
Additional Information: In the case of S3 with Ephemeral Cache, the cache was 100% warmed up before running the tests.
The table below presents all the collected performance indicators across different storage types:
The charts below provide comparisons of key performance indicators.
The above chart illustrates the average throughput observed during the YCSB tests. Notably, S3 with Ephemeral Cache demonstrates a throughput approximately 15-20 times higher than S3 without Ephemeral Cache. Although Express S3, which operates without a cache, displays promising performance compared to standard S3, it falls short of the performance levels achieved by S3 with Ephemeral Cache.
The chart above depicts the latency observed during read-based workloads. S3 with Ephemeral Cache exhibits significantly lower read latency when compared to other storage types. Express S3 also demonstrates improved latency performance compared to standard S3.
Based on the aforementioned results, it's evident that S3 with Ephemeral Cache emerges as the optimal storage solution for the Cloudera Operational Database in terms of performance. While Express S3 demonstrates improved performance compared to standard S3, it falls short of surpassing the performance achieved by S3 with Ephemeral Cache. Moreover, considering the limitations of Express S3 being confined to a single zone, it may not be the most suitable choice for users seeking optimal performance and durability simultaneously.
For further insights into performance evaluations of the Cloudera Operational Database (COD), the following resources may be of interest:
For additional information on Cloudera Operational Database, including product features and capabilities, visit the product page or reach out to your account team for personalized assistance.