Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need benchmarking for various filters in Hbase

Need benchmarking for various filters in Hbase

New Contributor

Hi All,

I have the following use case:

RowKey : Combination of (Long.MAX_VALUE - timestamp)|datasourceId|SiteId|...

Now, I want to get sorted results in descending order(Long.MAX_VALUE - timestamp) for a particular datasourceId.For this, i have used RegexFilter on Rowkey to get results for a particular datasource Id and hence timestamp is my starting point in rowKey so results are sorted.

But the query is very slow which i doubt because of regexFilter as it checks on each rowKey and matches the regex.What can be the solution for the above.

PS: I have multiple columns for each rowKey and want to get all columns.

Here are the two approaches I have come across:

1) Make new column for datasourceId and apply columnPrefixFilter to fetch results. In that way results will be sorted and filtered on the basis of datasourceId. But i doubt this will not make any difference in hbase response time. It will again fetch each column and check.

2) Use of secondary indexes.How can i use seconday indexes in this case?

Any help will be appreciated.

Don't have an account?
Coming from Hortonworks? Activate your account here