We have requirement where in input could be list of one column values which is part of ROWKEY prefix and hbase should be fast enough to load the data with this input.
Assume Row key designed with combination of 3 columns <colum1,colum2,column3>
Expected input is List of column1 values. But not known other column values of rowkey.
Also the input could be random values of column1. No need to be In sequential order to skip regions.
Could you suggest or recommend right approach to get read performance with part of rowkey (prefix mostly).
I use below approach but its taking long time. I noticed one thing that takes more time when the input become smaller list size. ( ex : input list has 1 to 10 columns)
Step 1: <strong>val </strong>df = <em>withCatalog</em>(cat,spark)
// withCatalog is utility to load HBase table with catalog as in SHC (https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/)
step 2 : val s = df.filter(df("rowkey").rlike("1803440523134880|1803440523134881"))
Step 2 has list of values with delimited by ‘|’ (<strong>"1803440523134880|1803440523134881")</strong><strong>. So that it searches for each input.</strong>
Appreciate your time and help on this to make best use of SHC connector to our project. Looking for SHC feature to support similar to Phoenix way of skip scan.