Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBASE concurrent scan requests performance issue

Highlighted

HBASE concurrent scan requests performance issue

New Contributor

Hi,

I am using Hbase Rowkey range scan to retrieve data from hbase, able to achieve performance of 1 million records/Sec.

We are able to get this performance only when we run a single Hbase scan query on a table.

If we run 2 or more scan queries on an Hbase table concurrently the performance of the Hbase scan is decreasing. which is directly proportional to the number of concurrent scan threads.

Example: If i run a single range scan request on an hbase table table1, the scan will complete in 1.02 seconds and the result set is 1 million rows.

In the same way if i run 2 scan requests on same table concurrently, the scan will take 1.8 second and the result set of both the scans same(1 million rows each).

The fun fact is both the scans completing almost at same time(i.e 1.8 second). This scan time will increase as the number of concurrent scanners increase.

Ideally when we run 2 or more scanners concurrently on an hbase table, all the scanners should complete in 1 sec only, am not sure why it is taking more time.

At first I though the issue is because the scanner work sequentially on regions, i might be trying to scan data of same region with 2 or more scan threads.

After that I ran range scan with 2 concurrent threads, which retrieve data from 2 different regions(not region server) of same Hbase table.

But still I am facing same issue, not sure what is reason for this issue.

Please let know what i should do??. If i want to perform 30 to 40 concurrent scan and still should same performance.(get 1 million rows/second )

if i run 20 concurrent range scan threads then the scan time will again increase to around 17 to 20 seconds.

P.S: I have tested this on single node and on 3 node cluster. In both the case performance is almost same.

With regards

Ashok

Don't have an account?
Coming from Hortonworks? Activate your account here