Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is difference between Hadoop and Spark?

What is difference between Hadoop and Spark?

New Contributor

I think hadoop and spark both are big data framework, so why Spark is killing Hadoop? what is the the difference between hadoop and spark.

3 REPLIES 3

Re: What is difference between Hadoop and Spark?

@Pritam Pal

Hadoop and spark are two different frameworks. On a very high level. Hadoop is a storage layer and Spark is a processing engine.

You can store the data in HDFS and run the aggregations using spark on top of that data.

Highlighted

Re: What is difference between Hadoop and Spark?

Guru

@Pritam Pal, Hadoop is a combination of HDFS ( data storage), YARN (app execution framework) And Mapreduce ( data processing engine). Thus, it is not fair to compare Hadoop and Spark. Mapreduce and Spark can be comparable because both of them are data processing engine.

Here is a good link which compares Mapreduce and Spark in detail

https://www.xplenty.com/blog/apache-spark-vs-hadoop-mapreduce/

Re: What is difference between Hadoop and Spark?

New Contributor

Hadoop

  • Stores data in local disk
  • Slow speed
  • Suitable for batch processing
  • External schedulers required
  • High latency
  • No in-built interactive mode.
  • less expensive hardware
  • very difficult to work.

Spark

  • Stores data in-memory
  • Faster speed
  • Suitable for batch and real-time processing
  • Schedules tasks itself
  • Low latency
  • Has interactive mode
  • Lot of RAM to run in-memory, increasing it in the cluster, gradually increases its cost.
  • It is easy to program