Support Questions

Find answers, ask questions, and share your expertise

Which of these approaches to HDF (Nifi) and HDP integration is best practice ?

avatar
Expert Contributor

A year ago I implemented a HDP platform. Soon after NiFi was established as the defacto way for integrating external data flows into the cluster. A year on I'm reimplementing the architecture and now HDF is available.

So is the assumption now that HDF runs on a node outside of the HDP and pushes data to it, as opposed to how I had it before where NiFi was installed on a node within the HDP cluster.

1 ACCEPTED SOLUTION

avatar

Hi @MPH,

The best practice for a production environment is to have a dedicated cluster for HDF (it is easier for high availability and resources management). However, if you are not looking for high availability with only one HDF node, then you could imagine the situation where HDF is running on an edge node. However, keep in mind that, at the moment, HDP and HDF are managed by two different Ambari.

Hope this helps.

View solution in original post

2 REPLIES 2

avatar
Master Guru

@MPH NiFi on production should have run in isolation due to its high needs to CPU and disk. Basically CPU and Disk bound and therefore not a good idea to co locate. This architecture or implementation strategy did not change. so my recommendation is, isolate your hdf/nifi cluster from HDP. Don't have these two platform compete for resources. Also HDF requires its own ambari (mpack) and is not managed with HDP cluster, essentally two installs of ambari.

avatar

Hi @MPH,

The best practice for a production environment is to have a dedicated cluster for HDF (it is easier for high availability and resources management). However, if you are not looking for high availability with only one HDF node, then you could imagine the situation where HDF is running on an edge node. However, keep in mind that, at the moment, HDP and HDF are managed by two different Ambari.

Hope this helps.