Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

What is the difference between Google Dataflow and Hadoop Data Flow

avatar

Can someone please help me understand the difference between Google Cloud DataFlow and Hortonworks DataFlow. Are there technical differences I should be aware of?

1 ACCEPTED SOLUTION

avatar
New Member

Google Cloud Dataflow is a service which replaces MapReduce processing, and is designed strictly for the Google Compute Cloud. Whereas Hortonworks Dataflow is a product aiming to solve data flow problems, even outside of data center.

So the answer is no, they are essentially using similar names to describe very different things. One is sitting in the cloud waiting for data to be delivered to it; and the other one delivers data to all kinds of processing systems: Google Dataflow, Storm, Spark, etc.

View solution in original post

3 REPLIES 3

avatar
New Member

Google Cloud Dataflow is a service which replaces MapReduce processing, and is designed strictly for the Google Compute Cloud. Whereas Hortonworks Dataflow is a product aiming to solve data flow problems, even outside of data center.

So the answer is no, they are essentially using similar names to describe very different things. One is sitting in the cloud waiting for data to be delivered to it; and the other one delivers data to all kinds of processing systems: Google Dataflow, Storm, Spark, etc.

avatar
Master Mentor

Google Dataflow is a language framework for multiple engines like Spark, Flink and mapreduce. Hadoop Data Flow is a data in motion processing tool with a visual editor.

avatar
Expert Contributor