Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Which of these approaches to HDF (Nifi) and HDP integration is best practice ?

Solved Go to solution

Which of these approaches to HDF (Nifi) and HDP integration is best practice ?

Contributor

A year ago I implemented a HDP platform. Soon after NiFi was established as the defacto way for integrating external data flows into the cluster. A year on I'm reimplementing the architecture and now HDF is available.

So is the assumption now that HDF runs on a node outside of the HDP and pushes data to it, as opposed to how I had it before where NiFi was installed on a node within the HDP cluster.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Which of these approaches to HDF (Nifi) and HDP integration is best practice ?

Hi @MPH,

The best practice for a production environment is to have a dedicated cluster for HDF (it is easier for high availability and resources management). However, if you are not looking for high availability with only one HDF node, then you could imagine the situation where HDF is running on an edge node. However, keep in mind that, at the moment, HDP and HDF are managed by two different Ambari.

Hope this helps.

View solution in original post

2 REPLIES 2
Highlighted

Re: Which of these approaches to HDF (Nifi) and HDP integration is best practice ?

Super Guru

@MPH NiFi on production should have run in isolation due to its high needs to CPU and disk. Basically CPU and Disk bound and therefore not a good idea to co locate. This architecture or implementation strategy did not change. so my recommendation is, isolate your hdf/nifi cluster from HDP. Don't have these two platform compete for resources. Also HDF requires its own ambari (mpack) and is not managed with HDP cluster, essentally two installs of ambari.

Highlighted

Re: Which of these approaches to HDF (Nifi) and HDP integration is best practice ?

Hi @MPH,

The best practice for a production environment is to have a dedicated cluster for HDF (it is easier for high availability and resources management). However, if you are not looking for high availability with only one HDF node, then you could imagine the situation where HDF is running on an edge node. However, keep in mind that, at the moment, HDP and HDF are managed by two different Ambari.

Hope this helps.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here