Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Stream Processor Selection

Contributor

Hi,

I am in the process of doing a protype of IOT in our organisation and in the process of charting out the architecture. It would be really appreciated if someone could help me choose the stream processor - storm or spark streaming. Not sure which one I should go about. Basically we are planning to record sensor events from fleet and we are ok with ocassional message loss.

Also we prefer something which is easy to implement. Not sure which one is easier to implement as well.

We are also planning to utilize the lambda architecture..one for batch and the other one for real time information.

Thanks

1 ACCEPTED SOLUTION

Super Guru

@chandramouli muthukumaran

Good question. Let me add Apache NiFi as one of your options as it is the "easiest" to implement as you would orchestrate entire data flow with a neat UI. NiFi was design for real time stream simple event processing. So if you use case is within this realm nifi is the way to go. moreover with MiNiFi you can have the process running on the small device (footprint ~40mb) which push data to data center. If you require complex event processing, HDF now comes with storm. so with nifi you get ease of operations, development, data linage, and message guarantee, and highly resilient solution. Oh not to mentioned back pressure when target repository is down.

Spark - if you require only complex event processing and can handle microbatching (as little as .5 second) then spark may be good fit.. Spark streaming is in my opinion easier to develop in then storm. No UI

Storm - VERY powerful complex stream processing engine with virtually zero latency. Storm now comes with capability to do rolling and tumbling window. Also latest release has back pressuring. No UI.

Hope that was helpful to start with.

View solution in original post

2 REPLIES 2

Super Guru

@chandramouli muthukumaran

Good question. Let me add Apache NiFi as one of your options as it is the "easiest" to implement as you would orchestrate entire data flow with a neat UI. NiFi was design for real time stream simple event processing. So if you use case is within this realm nifi is the way to go. moreover with MiNiFi you can have the process running on the small device (footprint ~40mb) which push data to data center. If you require complex event processing, HDF now comes with storm. so with nifi you get ease of operations, development, data linage, and message guarantee, and highly resilient solution. Oh not to mentioned back pressure when target repository is down.

Spark - if you require only complex event processing and can handle microbatching (as little as .5 second) then spark may be good fit.. Spark streaming is in my opinion easier to develop in then storm. No UI

Storm - VERY powerful complex stream processing engine with virtually zero latency. Storm now comes with capability to do rolling and tumbling window. Also latest release has back pressuring. No UI.

Hope that was helpful to start with.

Contributor

Thank you very much for your insight.