Support Questions

VinodGandhe · ‎04-14-2017

(Hbase is installed on CentOS machine.

I am fetching HBase data from my windows 10 computer.)

I am running some performance tests on HBase Java client / Thrift / REST interface.

I have a table called “Airline” which has 500K rows.

I am fetching all 500K rows from the table through 4 different Java programs. (using JAVA Client, Thrift, Thrift2 and REST)

Following are the performance numbers with various fetch sizes.

For all these the batch size is set to 100000

.	Fetch Size (Number of Rows)
.	1000	2000	5000	7500	10000	15000	20000
REST	135923	67520	31293	22417	18210	14281	12348
Thrift	135912	78630	38525	32470	27617	25223	27127
Thrift2	133807	74559	39691	32457	28241	27189	25426
Java API	45086	43945	44591	45393	44936	45849	45060

I could see that, there is a performance improvement as we increase the fetch size in case of REST, Thrift, and Thrift2.

But with Java API, I am seeing consistent performance, irrespective of fetch size.

Why fetch size is not impacting in JAVA Client?

Here is snippet of my Java Program

---------------------------------------

Table table = conn.getTable(TableName.valueOf("Airline"));

Scan scan = new Scan();

ResultScanner scanner = table.getScanner(scan);

for (Result[] result = scanner.next(fetchSize); result.length != 0; result = scanner.next(fetchSize))

{

-- process the rows

}

--------------------------------------------------------

Can someone help me in this. Am I using wrong methods/classes for data fetching through JAVA client.

VinodGandhe · ‎04-15-2017

Two follow up questions..

1. How to enable caching on Java Client ?

I tried doing scan.setCaching(integer Max); scan.cacheBlocks(true); But I did not see any difference in performance for subsequent runs.

2. I shutdown everything and tried REST with 20000, but still I could see that it is better than Java Client.

3. Why fetch size is not taking effect in Java Client ? Am I doing anything wrong in my program ?

elserj · ‎04-16-2017

Use Scan.setBatch(int) to control the number of records fetched per RPC with the Java API. The API call you are making only wraps calls to ResutlScanner.next(). It does not affect the underlying RPCs. You may also have to increase hbase.client.scanner.max.results.size as this caps the numbers of records return in a single RPC (default 2MB).

The Thrift and REST servers do NOT cache results. Please disregard the comment which asserts this.

VinodGandhe · ‎04-17-2017

Thanks for your reply Josh Elser.

scan.setMaxResultSize() is set to 10 MB

I tried setting Scan.setBatch() with different values, but I did not see any variation in the performance. For any batch size, performance is consistent. I did not see any improvement on higher Batch size..

VinodGandhe · ‎04-18-2017

Thanks for your reply Josh Elser.

scan.setMaxResultSize() is set to 10 MB

I tried setting Scan.setBatch() with different values, I could see that there is improvement compared to earlier, but I did not see any variation in the performance for different fetch sizes.

After setting scan.setMaxResultSize() is to 10 MB+

the new perf numbers are as below

.	Fetch Size (Number of Rows)
.	1000	2000	5000	7500	10000	15000	20000
Java API time	17692	17158	21524	21289	18802	18786	18786

For any batch size, performance is almost consistent. Where as with REST I could see the improvement on higher fetch size.

Till batch Size 10000 - Java Client looks good. Above 10000 batch size REST looks better .. why ?

What other parameters might be impacting this .

Cloudera Community

Support Questions

Why HBase java Client is slow compared to REST/THrift API