Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what is the best way to read HBase table with partial row key values and how to skip scan using SHC?

Highlighted

what is the best way to read HBase table with partial row key values and how to skip scan using SHC?

New Contributor

We have requirement where in input could be list of one column values which is part of ROWKEY prefix and hbase should be fast enough to load the data with this input.

Assume Row key designed with combination of 3 columns <colum1,colum2,column3>

Expected input is List of column1 values. But not known other column values of rowkey.

Also the input could be random values of column1. No need to be In sequential order to skip regions.

Could you suggest or recommend right approach to get read performance with part of rowkey (prefix mostly).

I use below approach but its taking long time. I noticed one thing that takes more time when the input become smaller list size. ( ex : input list has 1 to 10 columns)

Step 1: <strong>val </strong>df = <em>withCatalog</em>(cat,spark)
// withCatalog is utility to load HBase table with catalog as in SHC (https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/)
step 2 : val s = df.filter(df("rowkey").rlike("1803440523134880|1803440523134881")) 
Step 2 has list of values with delimited by ‘|’ (<strong>"1803440523134880|1803440523134881")</strong><strong>. So that it searches for each input.</strong>

Appreciate your time and help on this to make best use of SHC connector to our project. Looking for SHC feature to support similar to Phoenix way of skip scan.

Don't have an account?
Coming from Hortonworks? Activate your account here