Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase timerange scan slow in HDP 3.0

HBase timerange scan slow in HDP 3.0

New Contributor

Hi,

 

We have recently upgraded to HDP 3 and are facing an issue with HBase in time range scan after upgrade to HDP 3.1.0 from HDP 2.6.5. The time range scan for older dates is taking lot of time compared to the previous version and increases exponentially.

 

Below is the summary of the issue in NEW cluster (HDP 3.1.0) vs OLD cluster (HDP 2.6.5) when scanning same data volumes:

 

Scan command - scan '<TABLE>',TIMERANGE=>[1563903859000,1563903865000],FILTER=>"KeyOnlyFilter() AND FirstKeyOnlyFilter()"

Total Rows in tables - ~32M, Size - 124 GB

 

NEW cluster (HDP 3.1.0) - (Scan run on 22/8/2019)

Cluster Config - 11 Worker (128 GB RAM, 16 Core), 11 Region Servers, 3 Master, 1 Manager, 1 Edge, Non-Kerberized

Scan for 21/8/2019 takes 2 secs, ~11K rows

Scan for 20/8/2019 takes 2 secs, ~11K rows

Scan for 19/8/2019 takes 70 secs, ~11K rows

Scan for 18/8/2019 takes 800 secs, ~11K rows

 

Old cluster (HDP 2.6.5) -

Cluster Config - 3 Worker (128 GB RAM, 16 Core), 3 Region Servers, 2 Master, 1 Manager, 1 Edge, Kerberized

Scan for 21/8/2019 takes 27 secs, ~11K rows

Scan for 20/8/2019 takes 27 secs, ~11K rows

Scan for 19/8/2019 takes 41 secs, ~11K rows

Scan for 18/8/2019 takes 41 secs, ~11K rows

 

We were expecting the HBase to perform better in the new upgraded environment. Any thoughts on what could be wrong? All other configurations are same between the environments except kerberos.

Note: The rowkey scans are fast, but our requirement is to do time range scan.  Also we checked with multiple tables, the issue is with all HBase tables. 

 

Regards,

Diwakar

1 REPLY 1

Re: HBase timerange scan slow in HDP 3.0

New Contributor

Hi,

 

Any thoughts on below? What could be going wrong?

One observation is that only 4 out of the 11 regions servers work for the SCAN request. Still the performance should be better than the previous HDP 2 environment.

 

Regards,

Diwakar