Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase Scan slow after inserting million reords in table

Solved Go to solution
Highlighted

HBase Scan slow after inserting million reords in table

Rising Star

Hi,

I am on HDP 2.3.4 ( 3 node cluster) , My HBase scans are slow after inserting a million row data

As I am new bee to HBase, Any suggestions experts can provide me to tune performance.

Would really appreciate the help.

Thanks,

Divya

1 ACCEPTED SOLUTION

Accepted Solutions

Re: HBase Scan slow after inserting million reords in table

Rising Star

@Divya Gehlot Are you specifying start and stop key in scans? Open ended scan which doesn't specify start and stop key usually ends up with complete table scan and hence becomes slow. As @Randy Gelhausen mentioned optimal rowkey design will help you in specifying start and stop key.

5 REPLIES 5

Re: HBase Scan slow after inserting million reords in table

Super Guru

@Divya Gehlot

Couple suggestions

  • HBase is not performant for scans as it is a db for random reads/writes.
  • If scans are to be performs do it on the key and not the columns.

Re: HBase Scan slow after inserting million reords in table

Hi @Divya Gehlot, go to HBase -> Quick Links -> HBase Master UI, then select Table details on the top, locate and click on your table. It will show you the table regions, their server layout, and number of requests per region. You can then consider to split too busy regions, and move some regions to another nodes for a better load balancing. Refer to this for split/move, and to this for a good backgrounder. Since you have only 3 nodes the results might be limited. Regarding other properties, if you can afford, be sure to have enough RAM for Region servers, not less than 16G.

Re: HBase Scan slow after inserting million reords in table

@Divya Gehlot- as @Sunile Manjee noted, HBase is an indexed lookup system which can also perform scans. This makes you think a bit about your data access/query patterns before you can create an optimal table design.

In general, you want to design your rowkeys around your access patterns. Ensure your highest order rowkey bits can always be known to your application at HBase read-time, else your access will be a full-scan instead of a range scan.

Users of the raw HBase API often find themselves performing logic in their application code instead of server-side within HBase's RegionServer processes. A simple, but powerful way to avoid both writing large amounts of client application code and pulling significant chunks of data back, consider using Apache Phoenix on top of HBase. It makes it easy to perform a more selective HBase query via SQL query language, which also:

1. Lends itself more naturally to thinking about how data is laid out in your tables

2. Lets you define secondary indices on the data your queries access regardless of whether your application knows a specific rowkey (or range) it needs to access.

Re: HBase Scan slow after inserting million reords in table

Rising Star

@Divya Gehlot Are you specifying start and stop key in scans? Open ended scan which doesn't specify start and stop key usually ends up with complete table scan and hence becomes slow. As @Randy Gelhausen mentioned optimal rowkey design will help you in specifying start and stop key.

Re: HBase Scan slow after inserting million reords in table

New Contributor

if you are using hbase shell for scanning, you can try:

> scan '<table>', CACHE => 1000

this CACHE will tell hbase RS to cache some certain number of rows before return, which can save lots of RPC calls.