Support Questions
Find answers, ask questions, and share your expertise

How to Randomly display the rows in HBase?

Expert Contributor

I am querying HBase to get a set of key and value using the limit clause. Here is the query

hbase(main):015:0> scan 'sample_table', {FILTER => "KeyOnlyFilter()",TIMESTAMP => 11, LIMIT => 2}

and I get some output. If I repeat the same query I get the same output. What I need is a different output every time I execute the query. In Hive we can use rand() to get a different output every time we query with LIMIT clause. I want to know if we have something similar in HBase?

4 REPLIES 4

You can use RandomRowFilter(https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/RandomRowFilter.html).

scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}

Expert Contributor

When I query the HBase table using your query I get the same row again and again. I changed the chance argument to different floats as well but no luck.

@Alex Raj, when I tried, it's giving me a proper randomization.

hbase(main):006:0> put 't1','1','f1:c1','1'
0 row(s) in 0.0080 seconds


hbase(main):007:0> put 't1','2','f1:c1','2'
0 row(s) in 0.0100 seconds


hbase(main):008:0> put 't1','3','f1:c1','3'
0 row(s) in 0.0030 seconds


hbase(main):009:0> put 't1','4','f1:c1','4'
0 row(s) in 0.0030 seconds


hbase(main):026:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL
 1                                                   column=f1:c1, timestamp=1476711506608, value=1
1 row(s) in 0.0070 seconds
hbase(main):027:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL
 4                                                   column=f1:c1, timestamp=1476711517300, value=4
1 row(s) in 0.0140 seconds
hbase(main):028:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL
 1                                                   column=f1:c1, timestamp=1476711506608, value=1
1 row(s) in 0.0060 seconds
hbase(main):029:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL 1 column=f1:c1, timestamp=1476711506608, value=1
1 row(s) in 0.0150 seconds
hbase(main):030:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL 2 column=f1:c1, timestamp=1476711511117, value=2
1 row(s) in 0.0070 seconds

Make sure, if you are using TIMESTAMP =>11, you have enough keys(more than 2) with timestamp less than 11.

Expert Contributor

Still does not work. How can I add another filter (KeyOnlyFilter()) along with this filter? I have tried with AND but did not work.