Member since
04-14-2017
5
Posts
0
Kudos Received
0
Solutions
04-18-2017
05:56 PM
Thanks for your reply Josh Elser. scan.setMaxResultSize() is set to 10 MB I tried setting Scan.setBatch() with different values, I could see that there is improvement compared to earlier, but I did not see any variation in the performance for different fetch sizes. After setting scan.setMaxResultSize() is to 10 MB+ the new perf numbers are as below
.
Fetch Size
(Number of Rows)
.
1000
2000
5000
7500
10000
15000
20000
Java API time
17692
17158
21524
21289
18802
18786
18786
For any batch size, performance is almost consistent. Where as with REST I could see the improvement on higher fetch size. Till batch Size 10000 - Java Client looks good. Above 10000 batch size REST looks better .. why ? What other parameters might be impacting this .
... View more
04-17-2017
04:43 PM
Thanks for your reply Josh Elser. scan.setMaxResultSize() is set to 10 MB I tried setting Scan.setBatch() with different values, but I did not see any variation in the performance. For any batch size, performance is consistent. I did not see any improvement on higher Batch size..
... View more
04-15-2017
01:34 AM
Two follow up questions.. 1. How to enable caching on Java Client ? I tried doing scan.setCaching(integer Max); scan.cacheBlocks(true); But I did not see any difference in performance for subsequent runs. 2. I shutdown everything and tried REST with 20000, but still I could see that it is better than Java Client. 3. Why fetch size is not taking effect in Java Client ? Am I doing anything wrong in my program ?
... View more
04-14-2017
03:01 PM
(Hbase is installed on CentOS machine. I am fetching HBase data from my windows 10 computer.) I am running some performance tests on HBase Java client /
Thrift / REST interface. I have a table called “Airline” which has 500K rows. I am fetching all 500K rows from the table through 4
different Java programs. (using JAVA Client, Thrift, Thrift2 and REST) Following are the performance numbers with various fetch sizes. For all these the batch size is set to 100000
.
Fetch
Size (Number of Rows)
.
1000
2000
5000
7500
10000
15000
20000
REST
135923
67520
31293
22417
18210
14281
12348
Thrift
135912
78630
38525
32470
27617
25223
27127
Thrift2
133807
74559
39691
32457
28241
27189
25426
Java API
45086
43945
44591
45393
44936
45849
45060
I could see that, there is a performance improvement as we
increase the fetch size in case of REST, Thrift, and Thrift2. But with Java API, I am seeing consistent performance,
irrespective of fetch size. Why fetch size is not impacting in JAVA Client? Here is snippet of my Java Program --------------------------------------- Table table =
conn.getTable(TableName.valueOf("Airline")); Scan scan = new Scan(); ResultScanner
scanner = table.getScanner(scan); for (Result[] result
= scanner.next(fetchSize); result.length != 0; result =
scanner.next(fetchSize)) { -- process the rows } -------------------------------------------------------- Can someone help me in this. Am I using wrong methods/classes
for data fetching through JAVA client.
... View more
Labels:
- Labels:
-
Apache HBase