- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on
12-14-2020
02:20 PM
- edited on
10-04-2021
09:08 PM
by
subratadas
Here we demonstrate how to run YCSB workloads for HBase performance testing. The YCSB benchmarking toolkit is available here.
- Download YCSB. To download YCSB > run
curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
- Untar YCSB. Untar the downloaded file:
tar xfvz ycsb-0.17.0.tar.gz
- Create HBase table. Run HBase shell and create table usertable.
- To open Shell, use command:
hbase shell
- Create 'usertable', 'cf'
- hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)
- hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
- To open Shell, use command:
- Set Java Home. Check if JAVA_HOME is already set using
echo $JAVA_HOME
- If Java home is not set, YCSB will give you an error. In Cloudera, path to jre could be at /usr/java/default. Once you find the path, you can set JAVA_HOME by using the following:
export JAVA_HOME=<path-to-jre>
- Load the data. Before you can actually run the workload, you need to "load" the data first. To load data, do the following:
- Change directory to where YCSB is unpacked.
cd ycsb-0.17.0
- Run the following command:
nohup ./bin/ycsb load hbase20 -p columnfamily=cf -p recordcount=15000000 -p fieldcount=10 -p fieldlength=100 -p workload=site.ycsb.workloads.CoreWorkload -threads 2 -s
- Parameters that were used here:
- -p parameter filedcount => 10
- -p parameter fieldlength => 100 (bytes)
- -p parameter recordcount => 15000000 (15 million) You can change this based on how big is your cluster and storage
- -p parameter columnfamily => cf ( Same as you created in your table)
- threads => 2
- Choose the number of threads based on number of cpus and region servers
- If you have 3 region servers then use at-least 3 threads to maximize parallelism
- -s to keep giving status of workload
- -cp can be used to add classpath to the run
- Change directory to where YCSB is unpacked.
- Choose the appropriate workload. YCSB has 6 default workloads already available:
- Workload A 50% Read and 50% Update
- Workload B 95% Read and 5% Update
- Workload C 100% Read
- Workload D Read/update/insert ratio: 95/0/5
- Workload E Scan/insert ratio: 95/5
- Workload F Read/read-modify-write ratio: 50/50
- Run YCSB workload.
Run the YCSB workloads, for example, to run Workload A for 15 min (900 sec), start the workload run using:cd ycsb-0.17.0
./bin/ycsb run hbase20 -p columnfamily=cf -P workloads/workloada -p requestdistribution=zipfian -p operationcount=15000000 -p recordcount=15000000 -p maxexecutiontime=900 -threads 2 -s
Note: It is best to run YCSB from a dedicated node, if you are running from a cluster node, which is a master or region server node. Ensure the following otherwise, YCSB will be unable to run successfully:
- Node you are running from has Zookeeper
OR
- hbase-site.xml (usually in /etc/hbase/conf directory) is copied to ycsb-0.17.0/hbase20-binding/conf directory
Watchouts: Common errors seen with YCSB
Sometimes the YCSB data load or workloads run but the operation (insert/read/update) shows 0 rows inserted with estimated completion in 106751991167300 days 15 hours.
Error message:
2020-02-04 23:00:09:424 20 sec: 0 operations; est completion in 106751991167300 days 15 hours
To solve this, you need to let the HBase client know where your HBase configuration is, and this can be done in two ways:
- Link to the hbase-site.xml in /etc/hbase/conf directory using the classpath parameter (-cp).
Example:./bin/ycsb run hbase20 -cp /etc/hbase/conf -p columnfamily=family -s -P workloads/workloada
- Create a conf directory and copy your cluster’s hbase-site.xml to it.
- cd ycsb-0.17.0
- mkdir -p hbase20-binding/conf
- cp /etc/hbase/conf/hbase-site.xml hbase20-binding/conf
- By default, dir => hbase20-binding/conf is added to classpath, else you can add it to your command line using -cp option (-cp hbase20-binding/conf)
About YCSB
YCSB is an open-source specification and program suite for evaluating the retrieval and maintenance capabilities of computer programs. It is a very popular tool used to compare the relative performance of NoSQL database management systems.