Community Articles
Find and share helpful community-sourced technical articles.
Cloudera Employee

Here we demonstrate how to run YCSB workloads for HBase performance testing. The YCSB benchmarking toolkit is available here.

  1. Download YCSB. To download YCSB > run
    curl -O --location
  2. Untar YCSB. Untar the downloaded file:
    tar xfvz ycsb-0.17.0.tar.gz
  3. Create HBase table. Run HBase shell and create table usertable.
    1. To open Shell, use command:
      hbase shell
    2. Create 'usertable', 'cf'
      1. hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)
      2. hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
  4. Set Java Home. Check if JAVA_HOME is already set using
    echo $JAVA_HOME
  5. If Java home is not set, YCSB will give you an error. In Cloudera, path to jre could be at /usr/java/default. Once you find the path, you can set JAVA_HOME by using the following: 
    export JAVA_HOME=<path-to-jre>
  6. Load the data. Before you can actually run the workload, you need to "load" the data first. To load data, do the following:
    1. Change directory to where YCSB is unpacked.
      cd ycsb-0.17.0
    2. Run the following command:
      nohup ./bin/ycsb load hbase20 -p columnfamily=cf  -p recordcount=15000000 -p fieldcount=10 -p fieldlength=100 -p workload=site.ycsb.workloads.CoreWorkload -threads 2 -s
    3. Parameters that were used here:
      • -p parameter filedcount => 10
      • -p parameter fieldlength => 100 (bytes)
      • -p parameter recordcount => 15000000 (15 million) You can change this based on how big is your cluster and storage
      • -p parameter columnfamily => cf ( Same as you created in your table)
      • threads => 2 
      • Choose the number of threads based on number of cpus and region servers
      • If you have 3 region servers then use at-least 3 threads to maximize parallelism
      • -s to keep giving status of workload
      • -cp can be used to add classpath to the run
  7. Choose the appropriate workload. YCSB has 6 default workloads already available:
    • Workload A 50% Read and 50% Update
    • Workload B 95% Read and 5% Update
    • Workload C 100% Read
    • Workload D Read/update/insert ratio: 95/0/5
    • Workload E Scan/insert ratio: 95/5
    • Workload F Read/read-modify-write ratio: 50/50
  8. Run YCSB workload.
    Run the YCSB workloads, for example, to run Workload A for 15 min (900 sec), start the workload run using:
    cd ycsb-0.17.0
    ./bin/ycsb run hbase20 -p columnfamily=cf  -P workloads/workloada -p requestdistribution=zipfian -p operationcount=15000000 -p recordcount=15000000 -p maxexecutiontime=900 -threads 2 -s



Note: It is best to run YCSB from a dedicated node, if you are running from a cluster node, which is a master or region server node. Ensure the following otherwise, YCSB will be unable to run successfully:

  • Node you are running from has Zookeeper


  • hbase-site.xml (usually in /etc/hbase/conf directory) is copied to ycsb-0.17.0/hbase20-binding/conf directory

Watchouts: Common errors seen with YCSB

Sometimes the YCSB data load or workloads run but the operation (insert/read/update) shows 0 rows inserted with estimated completion in 106751991167300 days 15 hours. 

Error message:


2020-02-04 23:00:09:424 20 sec: 0 operations; est completion in 106751991167300 days 15 hours


To solve this, you need to let the HBase client know where your HBase configuration is, and this can be done in two ways:

  1. Link to the hbase-site.xml in /etc/hbase/conf directory using the classpath parameter (-cp). 
    ./bin/ycsb run hbase20 -cp /etc/hbase/conf -p columnfamily=family -s -P workloads/workloada
  2. Create a conf directory and copy your cluster’s hbase-site.xml to it.
    1. cd ycsb-0.17.0
    2. mkdir -p hbase20-binding/conf
    3. cp /etc/hbase/conf/hbase-site.xml hbase20-binding/conf
    4. By default, dir => hbase20-binding/conf is added to classpath, else you can add it to your command line using -cp option (-cp hbase20-binding/conf)

About YCSB 

YCSB is an open-source specification and program suite for evaluating the retrieval and maintenance capabilities of computer programs. It is a very popular tool used to compare the relative performance of NoSQL database management systems.

0 Kudos