05-31-2017 03:51 AM - edited 05-31-2017 03:52 AM
In our cluster (CM 5.8.4 parcel 5.8.4-1.cdh5.8.4.p0.5, HBase 1.2.0-cdh5.8.4) we would like to retrive rows from Hbase in the simpliest available way using a REST API.
To retrive multiple rows with a REST call (GET) we can use the "Globbing Rows" REST api documented here -> https://hbase.apache.org/1.2/book.html#_rest in this way:
(example from documentation: http://example.com:8000/urls/https|ad.doubleclick.net|*)
We discovered that there is an undocumented behaviour. We can call the globbing REST api using startrow,endrow parameter in this way:
In this case it works exaclty as a scan (and it breaks the rest server if you try to retreive too many rows).
We also searched in this hbase souce code mirror (https://github.com/apache/hbase/blob/master/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/Ro...) and noticed that the code is intended to work like this (function private int parseRowKeys(final String path, int i) line 65-110).
Why is this behaviour not documented?
Is it safe to use (with restriction) this REST call instead of use a scanner or the absence of documentation means that the function will change without any comunication (not even the "deprecated" stuff)?
06-27-2017 01:54 AM
We engaged vendor professional support and we discovered that:
1- There is an internal thread on HBase REST API documentation that needs enhancement
2- It is possible to use this endpoint (but is not safe if you not have the control of the client that call the service) and, because it can cause OOM, is better/mandatory to use kerberos auth for Hbase REST API