I am having difficulty in understanding the use cases when we should go for stream processing vs ingestion
something like storm vs nifi or spark streaming vs nifi
some text answers seem to be if we want to aggregation, windowing operation etc. we should go for stream processing.
Other than the internal architecture differences, can someone please give an example of something with real data which can be done in storm/spark streaming but cannot be done in nifi or flume.
@Avijeet Dash Storm, NiFi, and Flume are all tools to help gather and move data but they do have some key differences. For example, there is an older streaming data benchmark called Linear Road, details of it can be found at http://www.cs.brandeis.edu/~linearroad/. The benchmark simulates vehicles on a tollway with various parameters and conditions that occur in real time, which requires that state be maintained and updated for many objects simultaneously as conditions change and the simulated vehicles interact with each other. This is typically referred to as "complex event processing", which Storm does well. Storm bolts can be built that will handle the state changes and caching for all of the objects, and will be able to keep them updated in near real time. NiFi, on the other hand, excels at what is commonly referred to as "simple event processing". An example of that would be something along the lines of converting JSON to SQL, or encryption/decryption of data in motion. The data is transformed, but the processing necessary doesn't require the computational overhead that the Linear Road example does. Flume is designed to facilitate collecting, aggregating, and moving large amounts of data and does not have the capabilities to do either simple or complex event processing, it is essentially a data movement tool.