I have a Hbase table which needs to be exposed as a web service using Nifi.
My actual path of usage is HandleHTTPRequest -- ExtractText -- (GetHbase/FetchHbase) -- HandleHTTPResponse.
But am unable to connect the processors from request -> hbase-> response at all. Is there anything am missing here?
Is it possible to query hbase table with request url param and send the output data in json format to external system using rest api?
Is there any limitation (content/size) in using Nifi in sending the content via rest?
It seems like overkill to use NiFi to create a web service. Have you explored the HBase REST server? https://hbase.apache.org/book.html#_rest
If, your description is missing some context about why NiFi must be used, can you describe the GetHBase/FetchHBase processor configuration and what your data in HBase looks like?
The web service must be able to respond to external calls outside hadoop cluster. Original idea is to develop a wrapper in java for the reading Request from external and query Hbase and deploy it as a web service in any web srver. But for these we need a web server to deploy and run. There is no rule to use Nifi, but since the capability is there we can make better out of it.
For GetHbase, Hbase_client_service is created and connected. The response has to be the hbase table data, so no insert/update is going to happen in hbase. Seems that, GetHbase is used only for putting the data into
Yet to try out other options. If you have came across any similar scenarios and can share the processor connections, it would be great.
I don't have numbers, but I would expect that you'd be incurring a bit of latency in your requests by virtue of using NiFi as opposed to a web service which directly talks to HBase. There are a number of things that NiFi is doing under the hood which are unnecessary for most REST applications.
The GetHBase processor is a source processor that does not take input. There is a new FetchHBase processor that hasn't been released yet that can retrieve a row from HBase based on an incoming flow file, so you could connect HandleHttpRequest to FetchHBase, and then to HandleHttpResponse.
Keep in mind that building a web service in NiFi ins't necessarily meant for end-user applications to be hitting it with a lot of concurrent users, it is more for system-to-system communication. For an end-user application you are probably better of with the HBase REST API that Josh pointed out, or some other custom REST service in front of HBase.
Yes as you specified, I had tried initially with GetHbase and came to know it isn't build for that purpose. Tried 'FetchHbaseRow', but it fails saying 'scan' method is not found.
Now trying with Python. Route is HandleHTTPRequest-> ExecuteStreamCommand -> HandleHTTPResponse. Here ExecuteStreamCommand will have a python script which will read request data and parse it and pass it to Hbase to retrieve the values. But again, Python doesn't have inbuilt library to fetch hbase data. Is there any library available to query hbase from python apart from HappyBase.
I tried using plain Hbase rest, but since hbase is Kerberos enabled, its not allowing us connect to external system. Any inputs are highly appreciated.
"I tried using plain Hbase rest, but since hbase is Kerberos enabled, its not allowing us connect to external system. Any inputs are highly appreciated."
The HBase REST server works with Kerberos enabled, but this is a completely different question than what you have asked here. Please open a new question and provide the relevant configuration values you have set, what you are executing, and the error you are experiencing.
@Josh Elser FetchHbaseRow configuration: Hbase Client service: Hbase_1_1_2_ClientService, Table Name: hbase_table, Row Identifier: id, Columns: cf1, Destination: flowfile-content.
Configured "Hbase_1_1_2_ClientService" with Kerberos principal and keytabs.