Support Questions

Find answers, ask questions, and share your expertise

Why HBase java Client is slow compared to REST/THrift API

avatar
Explorer

(Hbase is installed on CentOS machine.

I am fetching HBase data from my windows 10 computer.)

I am running some performance tests on HBase Java client / Thrift / REST interface.

I have a table called “Airline” which has 500K rows.

I am fetching all 500K rows from the table through 4 different Java programs. (using JAVA Client, Thrift, Thrift2 and REST)

Following are the performance numbers with various fetch sizes.

For all these the batch size is set to 100000

. Fetch Size (Number of Rows)
. 1000 2000 5000 7500 10000 15000 20000
REST 135923 67520 31293 22417 18210 14281 12348
Thrift 135912 78630 38525 32470 27617 25223 27127
Thrift2 133807 74559 39691 32457 28241 27189 25426
Java API 45086 43945 44591 45393 44936 45849 45060

I could see that, there is a performance improvement as we increase the fetch size in case of REST, Thrift, and Thrift2.

But with Java API, I am seeing consistent performance, irrespective of fetch size.

Why fetch size is not impacting in JAVA Client?

Here is snippet of my Java Program

---------------------------------------

Table table = conn.getTable(TableName.valueOf("Airline"));

Scan scan = new Scan();

ResultScanner scanner = table.getScanner(scan);

for (Result[] result = scanner.next(fetchSize); result.length != 0; result = scanner.next(fetchSize))

{

-- process the rows

}

--------------------------------------------------------

Can someone help me in this. Am I using wrong methods/classes for data fetching through JAVA client.

4 REPLIES 4

avatar
Explorer

Two follow up questions..

1. How to enable caching on Java Client ?

I tried doing scan.setCaching(integer Max); scan.cacheBlocks(true); But I did not see any difference in performance for subsequent runs.

2. I shutdown everything and tried REST with 20000, but still I could see that it is better than Java Client.

3. Why fetch size is not taking effect in Java Client ? Am I doing anything wrong in my program ?

avatar
Super Guru

Use Scan.setBatch(int) to control the number of records fetched per RPC with the Java API. The API call you are making only wraps calls to ResutlScanner.next(). It does not affect the underlying RPCs. You may also have to increase hbase.client.scanner.max.results.size as this caps the numbers of records return in a single RPC (default 2MB).

The Thrift and REST servers do NOT cache results. Please disregard the comment which asserts this.

avatar
Explorer

Thanks for your reply Josh Elser.

scan.setMaxResultSize() is set to 10 MB

I tried setting Scan.setBatch() with different values, but I did not see any variation in the performance. For any batch size, performance is consistent. I did not see any improvement on higher Batch size..

avatar
Explorer

Thanks for your reply Josh Elser.

scan.setMaxResultSize() is set to 10 MB

I tried setting Scan.setBatch() with different values, I could see that there is improvement compared to earlier, but I did not see any variation in the performance for different fetch sizes.

After setting scan.setMaxResultSize() is to 10 MB+

the new perf numbers are as below

. Fetch Size (Number of Rows)
. 1000 2000 5000 7500 10000 15000 20000
Java API time 17692 17158 21524 21289 18802 18786 18786

For any batch size, performance is almost consistent. Where as with REST I could see the improvement on higher fetch size.

Till batch Size 10000 - Java Client looks good. Above 10000 batch size REST looks better .. why ?

What other parameters might be impacting this .