Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

When is Hadoop better than Spark?

When is Hadoop better than Spark?

New Contributor

Spark seems to outperform Hadoop in every metric, or so it appears. In memory processing vs disk, spark is capable of real time streaming and batch streaming. And also provides a layer for integrating ML as well.

So right now I'm starting a role in a big investment bank. We operate on petabytes of data and our data pipeline is built on top of hadoop. I just graduated, but my friend at Oracle says that a lot of tech companies still use hadoop as well.

Forgive me if I am naive, but in ways is Hadoop better? Spark seems to be the best if you want a quick answer during real time analytics. I've never used Hadoop before. What capabilities does hadoop provide that spark doesnt? Doesn't Spark provide batch processing as well?

1 REPLY 1
Highlighted

Re: When is Hadoop better than Spark?

Master Guru
Depends on what you mean by "Hadoop" here. I agree with your points if
by Hadoop you simply meant Apache Hadoop MapReduce. Comparing Hadoop
HDFS and Hadoop YARN to Spark doesn't make much sense though, as Spark
still relies on those to run its application framework in many
environments.

The word "Hadoop" is loosely used in such comparisons, but mostly
people seem to mean the compute layer, not the persistence (and to
some extent, not the scheduler either).