Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Combine HDP and HDF

Explorer

I am just curious as to the reason the options in HDP and HDF cant be the same. Specifically, why cant i simply make nodes to support the HDF model and HDP model within the same cluster?

1 ACCEPTED SOLUTION

Hortonworks DataFlow and Hortonworks Data Platform share some components (Storm and Kafka for example). The two solutions compliment each other because they target different use cases. The reason that HDP doesn't offer HDF as an installable component is that HDF is designed to be very lightweight and installed on edge nodes where power, cooling and space are constrained. Nodes used for HDP are typically more powerful than nodes for HDF.

View solution in original post

11 REPLIES 11

Super Guru

HDF is Apache NiFi, Apache Storm and Apache Kafka where as HDP includes other Hadoop component with only overlap with Apache Storm.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2-Win/bk_HDP_Install_Win/content/iug_hdp_comp...

Can you please elaborate on "options in HDP and HDF can't be same"?

The reason you need different nodes is because of the requirement of the underlying differences in technologies and the use cases they serve.

Explorer

I am trying to understand why HDP and HDF are different stacks, instead of 1 stack that allows you to pick those components. Presently, Nifi is not an option for HDP, and I have to either manually add it or install a new machine with HDF. My curiosity is to why it is like that.

Super Guru

@Christopher Amatulli

Oh. That's because of how NiFi and Kafka and storm are independent of Hadoop ecosystem. They work regardless of whether data is being written to Hadoop or any other persistence layer within your organization. This enables Hortonworks to help customers who are not using Hadoop but still would like to have NiFi and Kafka to ingest/collect data and deliver to systems other than Hadoop. We have a number of customers who are using HDF and not using HDP, so keeping them separate enables us to help such customers.

Explorer

Ok, Thanks, that's explains why the second offering, and explains the different packaging. I still don't understand why the HDP stack doesn't have an install for Nifi. That only explains why HDF is a separate package.

Super Guru

You can use Ambari to install NiFi (https://github.com/abajwa-hw/ambari-nifi-service) , but it really needs to be it's own cluster as it's not part of the Apache Hadoop stack.

I like to think of HDP and HDF as Peanut Butter and Jelly. Awesome together,

Explorer

Timothy, thanks, the manual install is what I ended up doing. but I was curious as to why it needs to be it's own cluster and I cant just have a single cluster with resources separated for the different HDP/HDF processes.

I agree with the PB&J too! just don't know why I don't have my PB&J in 1 sandwich by default 🙂

Super Guru

Hadoop clusters are made up of Name Nodes and Data Nodes. There are a number of servers and services that directly use HDFS and those nodes so they need to be bundled tight. NiFi really is separate, it's an edge node that can work in cars, sensors or industrial devices. It makes more sense to keep it on it's own separate cluster as it has it's own clustering, doesn't use the NameNode or Zookeeper or the Hadoop infrastructure. It works well with writing to HDFS and Hadoop services, but it also works well with Azure, AWS, JMS, MQTT and other non-Hadoop sources.

Hortonworks DataFlow and Hortonworks Data Platform share some components (Storm and Kafka for example). The two solutions compliment each other because they target different use cases. The reason that HDP doesn't offer HDF as an installable component is that HDF is designed to be very lightweight and installed on edge nodes where power, cooling and space are constrained. Nodes used for HDP are typically more powerful than nodes for HDF.

Master Guru

@Michael Young

HDF NiFi at its core is designed to be very lightweight; however, how powerful a host/node that HDF NiFi needs to be deployed on really depends on the complexity of implemented dataflow and the throughput and data volumes that dataflow will be handling. HDF NiFi may be deployed at the edge, but usually along with those Edge deployments comes a centralized cluster deployment that runs a much more complex dataflow handling data coming from the edge NiFis as well as many other application sources.

Thanks,

Matt

Cloudera Employee

HI,

with HDF3.0 and Ambari 2.5.1 , we can install HDF to an existing HDP cluster. Please have a look

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.2/index.html

Thanks,

Avijeet

New Contributor

So why not create a sandbox (and tutorials) for either HDP (only) or HDP+HDF (combined)? The HDF sandbox isn't even useful for the HDF tutorials - since it lacks HDFS, HBASE, Superset, Druid, etc. The HDF sandbox seems to be defective as a learning vehicle. The crippled/limited HDF-only sandbox doesn't seem suited to it's purpose. What am I missing?