Support Questions
Find answers, ask questions, and share your expertise
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Scan salted HBase table

Expert Contributor

I have a HBase table with a key like this:

key = <salt>:<id>#<class>00:123#A00:234#A

The data is spread into 20 regions, identified by "00" to "19". I created the HBase table with this command:

create 'testtable', {NAME => 'k', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}, {NAME => 'b', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}, {NAME => 't', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => [ '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19']}

Now I need to Scan my table, filtering for a specific <id> value!

Before I salted my data, I could use a PrefixFilter in my Scan and everything worked fine. Here the OLD code:

byte[] prefix = Bytes.toBytes("123".getBytes());
Scan scan = new Scan(prefix);
Filter prefixFilter = new PrefixFilter(prefix);
FilterList list = new FilterList(prefixFilter, new KeyOnlyFilter());
ResultScanner scanner = tableToScan.getScanner(scan);

This doesn't work anymore since I use the salting!

How can I use the PrefixFilter for filtering by <id> in the whole key?


Rising Star

I haven't tried to do this myself, and I don't envy you the challenge it looks like, but the approach of overriding the getSplits() method of TableInputFormat seems to work: