Member since
12-14-2020
1
Post
0
Kudos Received
0
Solutions
12-14-2020
02:20 PM
Here we demonstrate how to run YCSB workloads for HBase performance testing. The YCSB benchmarking toolkit is available here.
Download YCSB. To download YCSB > run curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17.0.tar.gz
Untar YCSB. Untar the downloaded file: tar xfvz ycsb-0.17.0.tar.gz
Create HBase table. Run HBase shell and create table usertable.
To open Shell, use command: hbase shell
Create 'usertable', 'cf'
hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)
hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
Set Java Home. Check if JAVA_HOME is already set using echo $JAVA_HOME
If Java home is not set, YCSB will give you an error. In Cloudera, path to jre could be at /usr/java/default. Once you find the path, you can set JAVA_HOME by using the following: export JAVA_HOME=<path-to-jre>
Load the data. Before you can actually run the workload, you need to "load" the data first. To load data, do the following:
Change directory to where YCSB is unpacked. cd ycsb-0.17.0
Run the following command: nohup ./bin/ycsb load hbase20 -p columnfamily=cf -p recordcount=15000000 -p fieldcount=10 -p fieldlength=100 -p workload=site.ycsb.workloads.CoreWorkload -threads 2 -s
Parameters that were used here:
-p parameter filedcount => 10
-p parameter fieldlength => 100 (bytes)
-p parameter recordcount => 15000000 (15 million) You can change this based on how big is your cluster and storage
-p parameter columnfamily => cf ( Same as you created in your table)
threads => 2
Choose the number of threads based on number of cpus and region servers
If you have 3 region servers then use at-least 3 threads to maximize parallelism
-s to keep giving status of workload
-cp can be used to add classpath to the run
Choose the appropriate workload. YCSB has 6 default workloads already available:
Workload A 50% Read and 50% Update
Workload B 95% Read and 5% Update
Workload C 100% Read
Workload D Read/update/insert ratio: 95/0/5
Workload E Scan/insert ratio: 95/5
Workload F Read/read-modify-write ratio: 50/50
Run YCSB workload. Run the YCSB workloads, for example, to run Workload A for 15 min (900 sec), start the workload run using: cd ycsb-0.17.0 ./bin/ycsb run hbase20 -p columnfamily=cf -P workloads/workloada -p requestdistribution=zipfian -p operationcount=15000000 -p recordcount=15000000 -p maxexecutiontime=900 -threads 2 -s
Note: It is best to run YCSB from a dedicated node, if you are running from a cluster node, which is a master or region server node. Ensure the following otherwise, YCSB will be unable to run successfully:
Node you are running from has Zookeeper
OR
hbase-site.xml (usually in /etc/hbase/conf directory) is copied to ycsb-0.17.0/hbase20-binding/conf directory
Watchouts: Common errors seen with YCSB
Sometimes the YCSB data load or workloads run but the operation (insert/read/update) shows 0 rows inserted with estimated completion in 106751991167300 days 15 hours.
Error message:
2020-02-04 23:00:09:424 20 sec: 0 operations; est completion in 106751991167300 days 15 hours
To solve this, you need to let the HBase client know where your HBase configuration is, and this can be done in two ways:
Link to the hbase-site.xml in /etc/hbase/conf directory using the classpath parameter (-cp). Example: ./bin/ycsb run hbase20 -cp /etc/hbase/conf -p columnfamily=family -s -P workloads/workloada
Create a conf directory and copy your cluster’s hbase-site.xml to it.
cd ycsb-0.17.0
mkdir -p hbase20-binding/conf
cp /etc/hbase/conf/hbase-site.xml hbase20-binding/conf
By default, dir => hbase20-binding/conf is added to classpath, else you can add it to your command line using -cp option (-cp hbase20-binding/conf)
About YCSB
YCSB is an open-source specification and program suite for evaluating the retrieval and maintenance capabilities of computer programs. It is a very popular tool used to compare the relative performance of NoSQL database management systems.
... View more