I created a Hive Table, that can access another HBase table. Something like this:
create external table hbase_50mio_200regions (rowkey String, valuelist String)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:valueList")
TBLPROPERTIES ('hbase.table.name' = 'hbase_50mio_200regions')
The access works and HiveQL queries are fast for small HBase table.
The problems are the queries to huge HBase tables. It takes very long to finish e.g. a simple count (takes >1h on 50 Mio rows HBase table with about 1300 GB of HFiles)!
Is there a way to pass start and stop key to my Hive SELECT or COUNT queries?
As I saw in the following question https://stackoverflow.com/a/40729818 the WHERE conditions won't avoid a full table scan, as they only filter the rows after iterating all rows of the HBase table.
Is this right and/or is there another way to avoid full table scans when accessing my HBase table through Hive?