We are experiencing intermittent issues when connecting to our HBase cluster via Knox / REST.
We are having trouble understanding what the Knox audit log is telling us.
Sample Knox audit log entry is shown below.
17/08/28 12:04:19 ||29cb79b2-383a-49de-b1aa-92465ae4735f|audit|WEBHBASE||||access|uri|<REDACTED>|unavailable|
17/08/28 12:04:19 ||29cb79b2-383a-49de-b1aa-92465ae4735f|audit|WEBHBASE|<USER>|||authentication|uri|<REDACTED>|success|
17/08/28 12:04:19 ||29cb79b2-383a-49de-b1aa-92465ae4735f|audit|WEBHBASE|<USER>|||authentication|uri|<REDACTED>|success|Groups: 
17/08/28 12:04:19 ||29cb79b2-383a-49de-b1aa-92465ae4735f|audit|WEBHBASE|<USER>|||authorization|uri|<REDACTED>|success|
17/08/28 12:04:24 ||29cb79b2-383a-49de-b1aa-92465ae4735f|audit|WEBHBASE|<USER>|||dispatch|uri|<REDACTED>|success|Response status: 200
17/08/28 12:04:24 ||29cb79b2-383a-49de-b1aa-92465ae4735f|audit|WEBHBASE|<USER>|||access|uri|<REDACTED>|success|Response status: 200
I have removed the Username and REST endpoint / query from the log.
The questions I have are as follows:
1. There is a 5s delay evident between the "authorization" log entry and the "dispatch" log entry.
I'm not sure what this means exactly because I'm not sure what Knox is doing under the hood at this time.
It could be:
- A delay in the Authorization step (Knox is waiting for auth). If the log entry was generated at the start of the authorization step then there would be a delay evident in the audit log until the dispatch entry is written. I would assume that authorization would be quick, but am not sure.
- A delay in the Dispatch step (Knox is waiting on the HBase REST API to return). If the entry was generated after the dispatch step was done. This seems like the more likely candidate as it involves setting up a connection to HBase REST, submitting the query, and waiting for a response.
Does anyone know which of the 2 options above are valid?
Also, are there any ways to fix this delay?
2. The first log entry has a response of "unavailable".
Is this of any concern? All log entries, including non-delayed ones, have this status so I have assumed it is not serious.
Some metrics of our cluster:
- The delays do not seem to be related to query load / concurrency
- We do around 30 queries per second which HBase is supposed to handle. Not sure if the REST API or Knox gateway are rated for this degree of concurrency.
- We have plenty of hardware - more than 10 worker nodes. The HBase REST and Knox services are both running on the same master node.
- The queries are small and return maybe 10 rows and 30 columns at most from HBase
Based on the above we don't expect a hardware issue or an issue with HBase itself (although anything is possible). We expect the issue to perhaps be with Knox or the HBase REST API.