Support Questions

Find answers, ask questions, and share your expertise

Where to Host API within Hadoop for Knox Query

avatar
Explorer

I would like to host an API from within Hadoop to be queried from Knox. I am able to query API's from around the web, but how would I go about hosting my own API and placing it withing Hadoop.

As a test, I have built a basic temperature units converter API in Eclipse and I'm hosting it on my Tomcat. I would like to host this in Hadoop to query it from Knox (the Knox query part I can do).

From reading documentation I think I will need to host on a Jetty Server, but I have no clue if it's possible and if so, where to host it.

Any help would be much appreciated.

1 ACCEPTED SOLUTION

avatar

Thinking about running it "within Hadoop" may be the wrong way to think about it. From a Knox perspective "within Hadoop" is usually discussed from a security perspective. This means that everything "within Hadoop" is protected by the same firewall setup, same Kerberos configuration, etc. It is really just a collection of hosts dedicated to the various components that make up the solution. So in your case the Tomcat hosted API could simply run on one of the hosts that is part of the infrastructure dedicated to Hadoop. This API would be accessed via Knox which would be running on one of the hosts considered to be at the "permitter" from a network security perspective.

All of the above being said, it is actually possible to run your Jetty or Tomcat hosted API "on Hadoop" via Slider. In this case the life-cycle of your API server would be managed by Hadoop. This would present some challenges from a Knox perspective as the Hadoop resource manager YARN may run your API server on any compute node in the Hadoop cluster making the Knox configuration challenging.

View solution in original post

3 REPLIES 3

avatar

Thinking about running it "within Hadoop" may be the wrong way to think about it. From a Knox perspective "within Hadoop" is usually discussed from a security perspective. This means that everything "within Hadoop" is protected by the same firewall setup, same Kerberos configuration, etc. It is really just a collection of hosts dedicated to the various components that make up the solution. So in your case the Tomcat hosted API could simply run on one of the hosts that is part of the infrastructure dedicated to Hadoop. This API would be accessed via Knox which would be running on one of the hosts considered to be at the "permitter" from a network security perspective.

All of the above being said, it is actually possible to run your Jetty or Tomcat hosted API "on Hadoop" via Slider. In this case the life-cycle of your API server would be managed by Hadoop. This would present some challenges from a Knox perspective as the Hadoop resource manager YARN may run your API server on any compute node in the Hadoop cluster making the Knox configuration challenging.

avatar
Expert Contributor

There is currently no end-to-end API management story for hosting APIs on Hadoop. I have spent some time thinking and talking about this idea and with some customer driven usecases we may be able to get it on the roadmap.

In the meantime, I would consider using Slider to deploy a service that you can host on tomcat or jetty, etc. I know that there has been some recent work for deploying tomcat to YARN via Slider. This may be worth looking into. See the work done as part of https://issues.apache.org/jira/browse/SLIDER-1012.

If you would like to bring a more holistic API management usecase to the Knox community that would incorporate the use of Slider to deploy and publish APIs to the YARN registry, and a facilities for discovery and subscription that would be useful for your deployment scenarios then please engage the Knox community on the dev@ list. We would love to get your insights and perspective!

avatar
Master Mentor

@Sam Bass has this been resolved? Please accept best answer or provide your own solution.