Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Accessing HBase Table through Hive is very slow

SOLVED Go to solution
Highlighted

Accessing HBase Table through Hive is very slow

Expert Contributor

I created a Hive Table, that can access another HBase table. Something like this:

create external table hbase_50mio_200regions (rowkey String, valuelist String)
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
  WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:valueList")
  TBLPROPERTIES ('hbase.table.name' = 'hbase_50mio_200regions')

The access works and HiveQL queries are fast for small HBase table.

The problems are the queries to huge HBase tables. It takes very long to finish e.g. a simple count (takes >1h on 50 Mio rows HBase table with about 1300 GB of HFiles)!

Is there a way to pass start and stop key to my Hive SELECT or COUNT queries?

As I saw in the following question https://stackoverflow.com/a/40729818 the WHERE conditions won't avoid a full table scan, as they only filter the rows after iterating all rows of the HBase table.

Is this right and/or is there another way to avoid full table scans when accessing my HBase table through Hive?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Accessing HBase Table through Hive is very slow

1. COUNT will result in a full table scan and hence the query is slow.
2. Where on the primary key will be fast as it will do a lookup and not a scan.
3. Where used on any column apart from the primary key will result in a HBase full table scan.
4. Analyse table once to speed up count queries. But it will not affect the where on no-primary key.

1 REPLY 1

Re: Accessing HBase Table through Hive is very slow

1. COUNT will result in a full table scan and hence the query is slow.
2. Where on the primary key will be fast as it will do a lookup and not a scan.
3. Where used on any column apart from the primary key will result in a HBase full table scan.
4. Analyse table once to speed up count queries. But it will not affect the where on no-primary key.