Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Performance Comparison Between Spark and Storm

avatar
New Member

Good Morning, guys

Before i will go with the question, first i must to tell you this. I'm college student (IT) in Indonesia. This year is my final year and i want to make a project in Big Data. So, i want to get some data with Apache Nifi dan send it to Output. Afterwards, i process it with Spark and Storm.

1. What Performance Comparison should i do ? Response Time ? Throughput ? Real Time Proccessing Speed ?

2. Is it possible to do that ? Because, i know someone in the internet says benchmark between Spark and Storm inaccurate and there is no paper or journal about this. ( http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming )

3. If you have a better idea about my final project. Let me know it.

Thanks before, Have a nice day.

1 ACCEPTED SOLUTION

avatar
Master Guru

@Rendiyono Wahyu Saputro I recommend you look at storm vs spark in a different manner. if your stream response can handle some latency (as little as 1/2 a second) then spark may be the way to go. This is just my opinion as spark streaming is so darn easy. Storm is a POWERFUL engine with virtually zero latency. Storm has been clocked on millions of tuples per node per second. So you have to ask yourself if your use case needs zero latency or can you handle micro batch (spark streaming)

View solution in original post

1 REPLY 1

avatar
Master Guru

@Rendiyono Wahyu Saputro I recommend you look at storm vs spark in a different manner. if your stream response can handle some latency (as little as 1/2 a second) then spark may be the way to go. This is just my opinion as spark streaming is so darn easy. Storm is a POWERFUL engine with virtually zero latency. Storm has been clocked on millions of tuples per node per second. So you have to ask yourself if your use case needs zero latency or can you handle micro batch (spark streaming)