There are many situations in which running benchmarks for certain workloads on Apache Phoenix can provide meaningful insight into an installation. Commonly, such a benchmark is very useful to understand the baseline characteristics of a new installation of Apache Phoenix. Alternatively, the ability to re-run the same benchmark after changing a
configuration property change is extremely useful in understanding the effect of that change. Many approaches exist to
test systems that have a SQL interface, many of them focused on a specific type of workload. The following approaches
aim to describe a few benchmarks which users can run on their own and tweak to a workload which makes sense for their
Apache JMeter Automation
Apache JMeter is a tool which was initially designed to test web applications; however, it also has the ability to execute SQL queries against some JDBC Driver. JMeter allows us to define queries to execute against a Phoenix table.
An instance of JMeter can run multiple threads of queries in parallel, with multiple instances of JMeter capable
of spreading clients across many nodes. The queries can also be parameterized with pseudo-random data in order to simulate all
types of queries to a table.
A number of example queries are also provided which vary in the style of the query, e.g. point queries or range-scan queries. JMeter automates the the execution of the queries in parallel. The results of the queries that JMeter ran are also
aggregated and analyzed together to provide an overall view into the performance. Mean and median are provided for
a simple insight, as well as 90th, 95th, 99th and 99.9th percentiles to understand the execution tail.
This approach is extremely useful to execute read-heavy workloads. Indexes can be created over the original TPC-DS
dataset to mimic your real datasets. The provided queries are only a starting point and can be easily expanded to
any other type of query.
The provided README file gives general instructions to generating and querying the data.
Apache Phoenix Pherf
Pherf is a tool which Apache Phoenix provides out of the box to test both read and write performance. It also aims to
provide some means for verifying correctness, but this feature is a bit lacking, being hard to test correctness in ways
other than record counts.
Pherf requires two things to run a test: a schema and a scenario. A schema is some SQL file defining DDL (data
definition langauge) for some table(s) or index(es). The scenario defines both the write and read tests to execute
against those tables defined in the schema. On the write-side, like the JMeter support, Pherf also supports the generation
of pseudo-random data to populate into the tables. In addition to purely random data, Pherf also has the ability to
specify data to write with given probabilities. The scenario then defines the number of records which should be
inserted into the table given the rules on the data generation. On the read-side, Pherf allows the definition of queries
and the expected outcome of those queries to run be on the tables which were just populated.
Pherf can collect metrics about the scenario being executed, but the results are not aggregated and presented for human
The "Yahoo! Cloud Serving Benchmark"
https://github.com/brianfrankcooper/YCSB is well-known benchmarking software in the
database field. YCSB has many bindings for both SQL and NoSQL databases, commonly being used directly by Apache HBase
for performance testing. YCSB has workloads which define how data is written and read from the tables in the database. A
workload defines the number of records/operations, the ratio of reads/updates/scans/inserts, and the distribution (e.g.
Zipfian) of data to generate. YCSB doesn't provide fine-grained control over the type of data to generate via
configuration (like JMeter and Pherf do), but this can be nice to not have to configure (using the provided YCSB
workloads as "standard" workloads).
Like all of the above, YCSB can be executed on one node or run concurrently across many nodes. The result of the
benchmark are reported very similarly to what the JMeter approach does (mean, median, and percentiles), but is probably
the most detailed.
YCSB does require some modifications to run against Apache Phoenix (as Phoenix doesn't support the traditional "INSERT"
command). Long term, this modifications will likely land upstream to ease use of YCSB against Phoenix.
In conclusion, there are a number of tools available to use to understand the performance of Apache Phoenix. For any user, having a representative benchmark for your specific workloads is an extremely important tool in running a cluster. These kinds of benchmarks let you evaluate the performance of your cluster as your change application and operating-system configurations. Benchmarks do require a bit of effort to understand what the results report. The results should always be looked at critically to ensure the numbers are sensible and that you understand why the results are what they are. All users, whether new or old, should strongly consider investing time into finding the right benchmark for their Apache Phoenix application if they do not already have one.