Support Questions

Find answers, ask questions, and share your expertise

Using Storm and Kafka for System of Insight (HDP) to and from System of Engagement (i.e.mobile app) interaction?

avatar
Expert Contributor

The SoI pushing to SoE insights seems like a great fit, but how about a sync or async API call from mobile app that needs something a little more than a precalculated SoI store and needs to do some additional quick analytics work in SoI to deliver the response. So thought is Storm with Trident (maybe JMS) from API and/or Bus platform for sync call, and possibly include kafka for the async API scenario where response would be stored in cache for async retrieval. Does this seem like a good fit? Thinking Service wrapper down the road may also make sense?

1 ACCEPTED SOLUTION

avatar
Guru

I have recently built a demo that does something like this. Keep in mind that as soon as you add a CEP tool like Storm or a queuing system like Kafka or JMS into the architecture your solution becomes asynch. For the synch portion just setup a basic REST web service that makes a synch call to the backend. For the asynch with analytics part try this:

Nifi listening on HTTP to receive the rest web service call from the mobile app --> use Nifi to shape the request and route it to the correct Kafka queue in case you need multiple points of entry. In either case it gives you the flexibility to change the data model of the request and/or the response without having to change both the client and the server --> Kafka queues the request to support once and only once delivery --> Storm to consume request and apply whatever analytics you need. Get whatever data is required from data serving layer --> you can use cache or a data grid like Ignite or Gemfire, however, unless you need sub millisecond response or you are doing 250K TPS or more I would just go with Hbase as the data serving layer (HBase can handle 250K + TPS but you need more region servers and some tuning) --> At this point Storm should have the response and can either post it back to Kafka or HTTP where Nifi can consume it and deliver it to the mobile app through something like Google Cloud Messaging. This architecture will give you a very flexible, very near real time asynch analytics platform that will scale up as far as you want to go.

View solution in original post

5 REPLIES 5

avatar
Guru

I have recently built a demo that does something like this. Keep in mind that as soon as you add a CEP tool like Storm or a queuing system like Kafka or JMS into the architecture your solution becomes asynch. For the synch portion just setup a basic REST web service that makes a synch call to the backend. For the asynch with analytics part try this:

Nifi listening on HTTP to receive the rest web service call from the mobile app --> use Nifi to shape the request and route it to the correct Kafka queue in case you need multiple points of entry. In either case it gives you the flexibility to change the data model of the request and/or the response without having to change both the client and the server --> Kafka queues the request to support once and only once delivery --> Storm to consume request and apply whatever analytics you need. Get whatever data is required from data serving layer --> you can use cache or a data grid like Ignite or Gemfire, however, unless you need sub millisecond response or you are doing 250K TPS or more I would just go with Hbase as the data serving layer (HBase can handle 250K + TPS but you need more region servers and some tuning) --> At this point Storm should have the response and can either post it back to Kafka or HTTP where Nifi can consume it and deliver it to the mobile app through something like Google Cloud Messaging. This architecture will give you a very flexible, very near real time asynch analytics platform that will scale up as far as you want to go.

avatar
Expert Contributor

Thanks Vadim, this helps. On the sync side, if the web service or API manager handling the Rest call from client sits outside of hadoop cluster, what would be your recommendation for handling the sync call into cluster if some of the work is better done there. I know many options, just curious of perspective.

avatar
Guru

@dconnolly The simplest approach is to build a REST web service on a servlet container like Tomcat or Jetty. Strictly speaking, you could publish that web service anywhere, however, I would recommend that you leverage the existing resources available in the Hadoop cluster use Slider to run the web service on Yarn. Try this:

Create a web service and build it as a runnable jar. Put that jar into a linux docker container and then create the configuration files need to run the docker container with slider.

https://slider.incubator.apache.org/docs/slider_specs/application_pkg_docker.html

http://www.slideshare.net/hortonworks/docker-on-slider-45493303

Use Slider to start the docker container on Yarn and secure the listener endpoint with Knox. This would allow you to leverage the resources of the cluster, manage the synch service resources with Yarn, and provide security for the API endpoint with Knox.

avatar
Expert Contributor

Thanks Vadim, like the approach, much appreciated. Using Docker I could go the node.js route also.

avatar
Guru

@dconnolly My pleasure. Would you mind accepting the answer and up voting?