It is a very common operation to do prefix scan in HBase. For example, when reading HBase table from HBase, we may use the following table scan api:
val prefixFilter = new PrefixFilter(prefix)
val scan: Scan = new Scan()
scan.setFilter(prefixFilter)
However, the code above may appear to be very slow when scanning a large HBase table. The reason is: we need to set StartRow before using PrefixFilter. Without setting the start row properly, your HBase scan may have to begin with the very first region and waste lots of time to get to the first right place.
The recommended way is to use setRowPrefixFilter(byte[] rowPrefix), from its source code below, we can see that it helps us to set up the start row before doing table scan.
In addition, if you want to load HBase table into Spark, you can also use the Spark-HBase connector, which support Spark accessing HBase table as external data source. The method buildScan() can do hbase table scan and return RDD as result. Its related source code is here.
Thanks to Weiqing Yang and Ted Yu for the kind help.