New Contributor
Posts: 1
Registered: ‎10-08-2018

When is Hadoop better than Spark?

Spark seems to outperform Hadoop in every metric, or so it appears. In memory processing vs disk, spark is capable of real time streaming and batch streaming. And also provides a layer for integrating ML as well.

So right now I'm starting a role in a big investment bank. We operate on petabytes of data and our data pipeline is built on top of hadoop. I just graduated, but my friend at Oracle says that a lot of tech companies still use hadoop as well.

Forgive me if I am naive, but in ways is Hadoop better? Spark seems to be the best if you want a quick answer during real time analytics. I've never used Hadoop before. What capabilities does hadoop provide that spark doesnt? Doesn't Spark provide batch processing as well?

Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: When is Hadoop better than Spark?

Depends on what you mean by "Hadoop" here. I agree with your points if
by Hadoop you simply meant Apache Hadoop MapReduce. Comparing Hadoop
HDFS and Hadoop YARN to Spark doesn't make much sense though, as Spark
still relies on those to run its application framework in many

The word "Hadoop" is loosely used in such comparisons, but mostly
people seem to mean the compute layer, not the persistence (and to
some extent, not the scheduler either).