Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the difference between Google Dataflow and Hadoop Data Flow

avatar

Can someone please help me understand the difference between Google Cloud DataFlow and Hortonworks DataFlow. Are there technical differences I should be aware of?

1 ACCEPTED SOLUTION

avatar
Contributor

Google Cloud Dataflow is a service which replaces MapReduce processing, and is designed strictly for the Google Compute Cloud. Whereas Hortonworks Dataflow is a product aiming to solve data flow problems, even outside of data center.

So the answer is no, they are essentially using similar names to describe very different things. One is sitting in the cloud waiting for data to be delivered to it; and the other one delivers data to all kinds of processing systems: Google Dataflow, Storm, Spark, etc.

View solution in original post

3 REPLIES 3

avatar
Contributor

Google Cloud Dataflow is a service which replaces MapReduce processing, and is designed strictly for the Google Compute Cloud. Whereas Hortonworks Dataflow is a product aiming to solve data flow problems, even outside of data center.

So the answer is no, they are essentially using similar names to describe very different things. One is sitting in the cloud waiting for data to be delivered to it; and the other one delivers data to all kinds of processing systems: Google Dataflow, Storm, Spark, etc.

avatar
Master Mentor

Google Dataflow is a language framework for multiple engines like Spark, Flink and mapreduce. Hadoop Data Flow is a data in motion processing tool with a visual editor.

avatar
Expert Contributor