Hello, I'm new in the domain of Big data and i want to know the main differences between HDF(Hortonworks Data Flow) and HDP (Hortonworks Data Platform). that's means the usecases and the architecture(components) of each one.
I am sure that you might have gone through the following links, If not then it might be useful to clear some points.
I understood from the last link that :
HDF - is used to handle Data in Motion
HDP - is used to handle Data at Rest
But HDP contains storm (real time message processing) and Kafka ( distributed messaging system ).
So can we say that HDP can be used also to handle data in motion ?
Yes currently HDP (2.6.x) does contain Kafka and Strom but according to the release notes https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_release-notes/content/deprecated_items.h... those components will be removed from HDP starting from version 3.0.0
It means that in near future HDP will not handle data in motion anymore.
Yes, they will be moved from HDP starting from HDP 3.0.0.
The following information is taken from release notes:
The following components are marked moving from HDP and will be moved in a future HDP release to an alternative Hortonworks Subscription and Offering:
|Component or Capability||Status||Marked Moving as of||Target Release for Move|
|Apache Accumulo||Moving||HDP 2.6.0||HDP 3.0.0|
|Apache Kafka||Moving||HDP 2.6.0||HDP 3.0.0|
|Apache Storm||Moving||HDP 2.6.0||HDP 3.0.0|
|Cloudbreak||Moving||HDP 2.6.0||HDP 3.0.0|
What is the reason behind separation into HDP and HDF? It is very often that company needs both real-time data processing and batch processing, why not to make single package?