How performance evaluation tool pe of hbase works?


I wanted to know the details about the different options and commands present with pe tool.

Specifically number of rows, number of clients and columns. How it works?


The best way to learn about various pe options is to run "hbase pe" without any options or commands:

$ hbase pe
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation <OPTIONS> [-D<property=value>]* <command> <nclients>

About nclients I already replied to you in another question: This is the level of parallelism used to run the specified command, in case of default MapReduce it means that 10*nclinents mappers will be started. About other options you asked, and a few others I use:

rows            Rows each client runs. Default: One million
columns         Columns to write per row. Default: 1
presplit        Create presplit table. Recommended for accurate perf analysis (see guide).  Default: disabled
compress        Compression type to use (GZ, LZO, ...). Default: 'NONE'
table           Alternate table name. Default: 'TestTable'
bloomFilter     Bloom filter type, one of [NONE, ROW, ROWCOL]
valueSize       Pass value size to use: Default: 1024


hbase pe --table=TestTable2 --compress=GZ --presplit=4 randomWrite 5

And of course, first run one of write commands, followed by some reads. And for the output, look for the following lines in the output of the MR job:

HBase Performance Evaluation
Elapsed time in milliseconds=492463
Row count=1048560

You can also prepend "time" and run as "time hbase pe ...". For more details search the web, thought the results are segmented.