Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to Randomly display the rows in HBase?

How to Randomly display the rows in HBase?

Expert Contributor

I am querying HBase to get a set of key and value using the limit clause. Here is the query

hbase(main):015:0> scan 'sample_table', {FILTER => "KeyOnlyFilter()",TIMESTAMP => 11, LIMIT => 2}

and I get some output. If I repeat the same query I get the same output. What I need is a different output every time I execute the query. In Hive we can use rand() to get a different output every time we query with LIMIT clause. I want to know if we have something similar in HBase?

4 REPLIES 4

Re: How to Randomly display the rows in HBase?

You can use RandomRowFilter(https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/RandomRowFilter.html).

scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}

Re: How to Randomly display the rows in HBase?

Expert Contributor

When I query the HBase table using your query I get the same row again and again. I changed the chance argument to different floats as well but no luck.

Re: How to Randomly display the rows in HBase?

@Alex Raj, when I tried, it's giving me a proper randomization.

hbase(main):006:0> put 't1','1','f1:c1','1'
0 row(s) in 0.0080 seconds


hbase(main):007:0> put 't1','2','f1:c1','2'
0 row(s) in 0.0100 seconds


hbase(main):008:0> put 't1','3','f1:c1','3'
0 row(s) in 0.0030 seconds


hbase(main):009:0> put 't1','4','f1:c1','4'
0 row(s) in 0.0030 seconds


hbase(main):026:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL
 1                                                   column=f1:c1, timestamp=1476711506608, value=1
1 row(s) in 0.0070 seconds
hbase(main):027:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL
 4                                                   column=f1:c1, timestamp=1476711517300, value=4
1 row(s) in 0.0140 seconds
hbase(main):028:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL
 1                                                   column=f1:c1, timestamp=1476711506608, value=1
1 row(s) in 0.0060 seconds
hbase(main):029:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL 1 column=f1:c1, timestamp=1476711506608, value=1
1 row(s) in 0.0150 seconds
hbase(main):030:0> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.RandomRowFilter.new(0.5), LIMIT => 1}
ROW                                                  COLUMN+CELL 2 column=f1:c1, timestamp=1476711511117, value=2
1 row(s) in 0.0070 seconds

Make sure, if you are using TIMESTAMP =>11, you have enough keys(more than 2) with timestamp less than 11.

Highlighted

Re: How to Randomly display the rows in HBase?

Expert Contributor

Still does not work. How can I add another filter (KeyOnlyFilter()) along with this filter? I have tried with AND but did not work.

Don't have an account?
Coming from Hortonworks? Activate your account here