Created 06-15-2017 10:43 AM
Hello, I'm new in the domain of Big data and i want to know the main differences between HDF(Hortonworks Data Flow) and HDP (Hortonworks Data Platform). that's means the usecases and the architecture(components) of each one.
Created 06-15-2017 10:56 AM
I am sure that you might have gone through the following links, If not then it might be useful to clear some points.
https://hortonworks.com/products/data-center/hdf/
https://hortonworks.com/webinar/introducing-hortonworks-dataflow/
.
Created 06-15-2017 11:57 AM
Thank you.
I understood from the last link that :
HDF - is used to handle Data in Motion
HDP - is used to handle Data at Rest
But HDP contains storm (real time message processing) and Kafka ( distributed messaging system ).
So can we say that HDP can be used also to handle data in motion ?
Created 06-15-2017 12:04 PM
Hi,
Yes currently HDP (2.6.x) does contain Kafka and Strom but according to the release notes https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_release-notes/content/deprecated_items.h... those components will be removed from HDP starting from version 3.0.0
It means that in near future HDP will not handle data in motion anymore.
Created 06-15-2017 12:18 PM
Thank you @Andres Koitmäe
But according to this link storm and kafka will not be removed from HDP version 3.0.0
Created 06-15-2017 08:15 PM
Yes, they will be moved from HDP starting from HDP 3.0.0.
The following information is taken from release notes:
The following components are marked moving from HDP and will be moved in a future HDP release to an alternative Hortonworks Subscription and Offering:
Component or Capability | Status | Marked Moving as of | Target Release for Move |
---|---|---|---|
Apache Accumulo | Moving | HDP 2.6.0 | HDP 3.0.0 |
Apache Kafka | Moving | HDP 2.6.0 | HDP 3.0.0 |
Apache Storm | Moving | HDP 2.6.0 | HDP 3.0.0 |
Cloudbreak | Moving | HDP 2.6.0 | HDP 3.0.0 |
Created 05-29-2018 03:05 PM
What is the reason behind separation into HDP and HDF? It is very often that company needs both real-time data processing and batch processing, why not to make single package?