Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

How performance evaluation tool pe of hbase works?

Hi,

I wanted to know the details about the different options and commands present with pe tool.

Specifically number of rows, number of clients and columns. How it works?

1 REPLY 1

The best way to learn about various pe options is to run "hbase pe" without any options or commands:

$ hbase pe
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation <OPTIONS> [-D<property=value>]* <command> <nclients>
...

About nclients I already replied to you in another question: This is the level of parallelism used to run the specified command, in case of default MapReduce it means that 10*nclinents mappers will be started. About other options you asked, and a few others I use:

rows            Rows each client runs. Default: One million
columns         Columns to write per row. Default: 1
presplit        Create presplit table. Recommended for accurate perf analysis (see guide).  Default: disabled
compress        Compression type to use (GZ, LZO, ...). Default: 'NONE'
table           Alternate table name. Default: 'TestTable'
bloomFilter     Bloom filter type, one of [NONE, ROW, ROWCOL]
valueSize       Pass value size to use: Default: 1024

Example:

hbase pe --table=TestTable2 --compress=GZ --presplit=4 randomWrite 5

And of course, first run one of write commands, followed by some reads. And for the output, look for the following lines in the output of the MR job:

HBase Performance Evaluation
Elapsed time in milliseconds=492463
Row count=1048560

You can also prepend "time" and run as "time hbase pe ...". For more details search the web, thought the results are segmented.