Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive webservice with WebHCat or HiveServer2

avatar
Contributor

Hey,

I want to setup a REST webservice and I read that WebHCat is a REST API for HCatalog which is a part of Hive. So for my understand I need to use WebHCat für my REST API webservice.

But I also read, that HiveServer2 will also provide a REST API for Hive over a cliservice.

Now I'm a little bit confused...

1. What is the difference between HiveServer2 and WebHCat?

2. What is the best practice to setup a REST API webservice for Hive on HBase?

3. Is there any tutorial which shows how to setup a REST API webservice? There a so many tutorials, but nothing to this topic.

It would be great, if someone can provide me informations so I can set up a REST API webservice with Hive on HBase.

~Jan

9 REPLIES 9

avatar
Master Guru

avatar
Super Collaborator

Hi Jan

WebHcat is interface for the HDFS metadata management tool HCatalog. So for both PIG and Hive Hcatalog would store schema related information. Please view the attached tutorial.

Hiveserver2 is the actual engine that runs hive. You could

http://hortonworks.com/hadoop-tutorial/how-to-use-hcatalog-basic-pig-hive-commands/

Data can be accessed via WebHcat Rest APIs which would inern call the Hive API..

More Reference

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference

https://cwiki.apache.org/confluence/display/Hive/Hive+APIs+Overview#HiveAPIsOverview-WebHCat%28REST%...

avatar
Super Guru
@Jan Horton

WebHCat orignally named as Templeton is a REST API for HCatalog and related Hadoop components. HCatalog,Sub-­‐component of Hive which serve table meta information acrooss hadoop computation engines like mapreduce,pig, hive etc,it sPublic APIs and webservice wrappers for accessing metadata in Hive metastore,WebHCat actully  exppose these information through REST webservices. it also allow user to submit(queue) hadoop streamig,hadoop mapreduce,pig and hive job to the cluster but it is not ideal to regular analytical processing as it only queues the job and don't return you a query results.

Hiveserver2 is another hive component which is basically a thrift service which support multiple concurrent clients connecting through JDBC/ODBC.ideal for big analytical workload further to this you can read along https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

avatar
Contributor

@Satish Bomma & @Rajkumar Singh

Thank you both for the answer and the provided information. So if I understand it correct, I can only submit/execute hive querys using WebHCat and I did not get any result. If this is correct, how can I set up a REST webservice where I can submit/execute a query like "localhost:4200/userid/1" and get the result as httpresponse in a json format for example.

avatar
Super Guru

@Jan Horton

HUE: which takes advantages of webHCat and webHDFS to fullfil the desired outcome.

AMBARI HIVE View : which takes advantage of Thrift service achieve the same.you can also take the advantage of thrift service to implement your webservice.

https://github.com/apache/ambari/tree/trunk/contrib/views/hive/src/main/java/org/apache/ambari/view/...

one more approach I can suggest here is to write a simple hive thrift client which can send request (TFetchResultsReq) to hiveserver2 and got query result (TFetchResultsResp), please find along the sample program,

https://gist.github.com/rajkrrsingh/4ab7153ca90969dcad21

avatar
Contributor

@Rajkumar Singh

Ambari Hive View and HUE are similiar tool, isn't it? I used Ambari Hive View to do the tutorials in the sandbox and other examples. But I think it is just a webbased tool instead of the shell. I don't see any possibility to realise a REST webservice, or am I wrong?

The last approach with the hive thrift client looks better and thank you for the sample program. But I do not think that I can release a REST webservice.

It's confusing me, that there is no easy way to setup a REST webservice on hadoop. I have a HBase table where I want to read data from different clients through http and it seems to be not possible?!

avatar
Super Guru
@Jan Horton

Ambari view and Hue do similar things but they are different in terms of implementation. hue uses webHcat and webHDFS REST api to query on hive while ambari view do it through thrift client.

if you follow the ambari view code base and a sample program given here you can adept those to expose your REST service.

on your last comment you (I have a HBase table where I want to read data from different clients through http and it seems to be not possible?!) it's a possible through HBase REST api, furthur to this you can read from here .

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_query_s...

avatar
Contributor
@Rajkumar Singh

I'm starting to understand it. Thank you very much and I hope you will help me a little bit more because it is not 100% clear. I know that HBase has a REST API called Stargate, but I think it is the same problem like with Hive, or? I only want to use a REST webservice with Hive (over WebHCat) on HBase, because I think it is the easiest way. If it is better to setup the webservice with HBase (using Stargate), I can try it.

If I understand you correct, there is no standard solution for my problem. I have to write my own REST service based on the ambari view code and your sample code. Right?

I'm just a little surprised, because I believe that this is a standard problem where it already should be a ready solution.

avatar
Super Guru

@Jan Horton I am not sure about the other ready solution available but mostly people rely on hue and webHdfs to achieve this.I want to share my views on your problem statement where you want a REST webservice to query HBase tables managed by Hive.

for a sake of simplicity you can write a hive jdbc client and expose it as REST service. here is the sample program to query hive using jdbc

https://github.com/rajkrrsingh/HiveServer2JDBCSample/blob/master/src/main/java/HiveJdbcClient.java

to get the formatted resultset in json format your rest service can do like this

public static JSONObject  getFormattedResult(ResultSet res) throws JSONException {
                List<JSONObject> resList = new ArrayList<JSONObject>();
               JSONObject hd = new JSONObject();
   try {
                    // get column names
                    ResultSetMetaData rsMeta = res.getMetaData();
                    int columnCnt = rsMeta.getColumnCount();
                    List<String> columnNames = new ArrayList<String>();
                    for(int i=1;i<=columnCnt;i++) {
                        columnNames.add(rsMeta.getColumnName(i).toUpperCase());
                    }
 while(res.next()) { // convert each object to an human readable JSON object
                        JSONObject obj = new JSONObject();
                        for(int i=1;i<=columnCnt;i++) {
                            String key = columnNames.get(i - 1);
                            String value = res.getString(i);
                            obj.put(key, value);
                       }
                         resList.add(obj);

hope it will be much clearer and simple now.