Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Scan salted HBase table


Scan salted HBase table

Expert Contributor

I have a HBase table with a key like this:

key = <salt>:<id>#<class>00:123#A00:234#A

The data is spread into 20 regions, identified by "00" to "19". I created the HBase table with this command:

create 'testtable', {NAME => 'k', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}, {NAME => 'b', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}, {NAME => 't', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}, {SPLITS => [ '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19']}

Now I need to Scan my table, filtering for a specific <id> value!

Before I salted my data, I could use a PrefixFilter in my Scan and everything worked fine. Here the OLD code:

byte[] prefix = Bytes.toBytes("123".getBytes());
Scan scan = new Scan(prefix);
Filter prefixFilter = new PrefixFilter(prefix);
FilterList list = new FilterList(prefixFilter, new KeyOnlyFilter());
ResultScanner scanner = tableToScan.getScanner(scan);

This doesn't work anymore since I use the salting!

How can I use the PrefixFilter for filtering by <id> in the whole key?


Re: Scan salted HBase table

Rising Star

I haven't tried to do this myself, and I don't envy you the challenge it looks like, but the approach of overriding the getSplits() method of TableInputFormat seems to work:

Don't have an account?
Coming from Hortonworks? Activate your account here