Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to retrieve entire row including all columns in HBase when queried by time range in the scan?

How to retrieve entire row including all columns in HBase when queried by time range in the scan?

New Contributor

Currently, scan only returns the columns updated within the time range. But I need the entire row with other columns as well. How do I do that? Here is the snippet of my code. Please help!

Scan scan = new Scan();
scan.setTimeRange(1471710010773L, System.currentTimeMillis());
8 REPLIES 8

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

Super Collaborator

What's the value for VERSIONS attribute of the table ?

How many rows do you expect the scan to return within the timerange ?

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

New Contributor

Thanks a lot for the quick reply. Appreciated it.

Basically what I am looking is if any of the columns are not updated within the time range for any given row, those rows should not be returned in the results. This is working without any issue.

But if any of the columns are updated, I would like to see the entire row including other non-updated columns in the results instead of just returning the ones updated.

What's the value for VERSIONS attribute of the table ? : It has multiple versions.

How many rows do you expect the scan to return within the timerange ? Rows are being returned correctly, but I am not getting other columns that were not updated.

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

Super Collaborator

Workaround would be to issue Get's given the row keys retrieved from the Scan.

Use the following API from(H)Table:

Result[] get(List<Get> gets) throws IOException;

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

New Contributor

Thanks for the reply again. Is it possible to achieve with a single query because we are dealing with millions of rows from the results of the scan?

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

Super Collaborator

I need to dig deeper.

You can write an endpoint coprocessor which does retrieval server side but it is non-trivial.

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

@Lee K

Among the other answers in this thread, you could give Apache Phoenix a shot. That would address your problem and similar problems in the future avoiding expensive development. Just SQL in top of HBase.

Highlighted

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

Guru

You cannot use the Scan's time range filters because as you have guessed HBase is not a row-oriented engine, but a cell-oriented one. The correct approach is to write a Filter which will decide whether to include the whole row or not. For doing that, you can set the timerange in your filter, and override Filter.filterRow() and filterKeyValue() methods and keep the state within the row, and decide to include the row or not based on the Cells matching the timerange or not. You can find example filters to look at in the source code or elsewhere.

Re: How to retrieve entire row including all columns in HBase when queried by time range in the scan?

New Contributor

Thanks for the reply. Do you have some samples I can take a look? Thanks in advance!