Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDF Ambari Nifi

avatar
Rising Star

Hi All.

I am trying to install the HDF cluster on a 9 node cluster. Previously I have only worked on a single node Nifi standalone instance so its a kind big jump for me. So couple of question here

1. There is no concept of Slaves when talking about Nifi right? When 'Assigning masters' I need to add all nodes on which i need to install the Nifi service and that is how i will get a truly "9 node Nifi cluster"? By default the Ambari interface is recommending to install nifi only on 3 nodes.

2. By default Ambari is installing zookeeper on 3 nodes ( strangely these are the same nodes on which its installing Nifi) Why is it not using/recommending the rest of the nodes on which Ambari host is already installed. Do i have to install zookeeper client on every node i install nifi on?

3. In Nifi cluster is there single coordinator node where keeps the cluster together? e.g. My data flows puts all the flow files it recieves from golden gate into a folder at OS level. Do i have to create folder on every node to keep the cluster at sync?

1 ACCEPTED SOLUTION

avatar
Super Mentor
@Faisal Durrani

1. As of HDF 2.x NiFI was redesigned to be a zero-master cluster service. Previously their was a single NiFI cluster Manager and a bunch of member nodes. HDF 2.x and on uses Zookeeper to elect a cluster coordinator and Primary Node form any one of he installed NiFi nodes. This allows provides a HA control plane in NiFI. If the current elected node for either of those roles goes down, ZK will elect an replacement form the remaining active nodes. 3 nodes is the recommended minimum NiFi cluster size, so that is where Ambari starts for you. You can install as many node as you like.

2. Zookeeper needs to have an odd number of hosts to maintain quorum. While Ambari installs ZK on the same servers/hosts as NiFi, you can re-assign the ZK instances to any hosts you like. You do not need a ZK on every host that has NIFi and they do not need to be co-located on the sam host. Even with a 9 node NIFi cluster, a 3-5 node ZK is fine. NiFI uses ZK for electing the Cluster Coordinator, electing the Primary Node, and to store cluster wide state (Many components in NIFi record state).

3. There is a single Cluster Coordinator and a single Primary node (They can change at any time). It is not uncommon to see a single NIFi node get elected to serve both roles. The Primary ode is responsible for running any processors configured for "primary node" only. Since every NiFi node/instance contains the exact same dataflows (stored on flow.xml.gz file), every node must be able o run and validate the configurations locally. So any directories or configuration files needed by NiFi components must exist on every node. In you case where you are consuming from a local file system, it may be bets to create a network disk that is identically mounted by every node in your NiFi cluster. You can then configure your NiFi ingest processor of choice (for example ListFile) to run on "primary node" only. That way should the primary node change, any node can access the same source of data and continue ingesting data where the previous elected primary node left off.

Hope this addresses all your questions here.

Thanks,

Matt

If you find this answer addressed your questions, please take a moment to click "accept" below the answer to mark this thread complete.

View solution in original post

4 REPLIES 4

avatar
Super Mentor
@Faisal Durrani

1. As of HDF 2.x NiFI was redesigned to be a zero-master cluster service. Previously their was a single NiFI cluster Manager and a bunch of member nodes. HDF 2.x and on uses Zookeeper to elect a cluster coordinator and Primary Node form any one of he installed NiFi nodes. This allows provides a HA control plane in NiFI. If the current elected node for either of those roles goes down, ZK will elect an replacement form the remaining active nodes. 3 nodes is the recommended minimum NiFi cluster size, so that is where Ambari starts for you. You can install as many node as you like.

2. Zookeeper needs to have an odd number of hosts to maintain quorum. While Ambari installs ZK on the same servers/hosts as NiFi, you can re-assign the ZK instances to any hosts you like. You do not need a ZK on every host that has NIFi and they do not need to be co-located on the sam host. Even with a 9 node NIFi cluster, a 3-5 node ZK is fine. NiFI uses ZK for electing the Cluster Coordinator, electing the Primary Node, and to store cluster wide state (Many components in NIFi record state).

3. There is a single Cluster Coordinator and a single Primary node (They can change at any time). It is not uncommon to see a single NIFi node get elected to serve both roles. The Primary ode is responsible for running any processors configured for "primary node" only. Since every NiFi node/instance contains the exact same dataflows (stored on flow.xml.gz file), every node must be able o run and validate the configurations locally. So any directories or configuration files needed by NiFi components must exist on every node. In you case where you are consuming from a local file system, it may be bets to create a network disk that is identically mounted by every node in your NiFi cluster. You can then configure your NiFi ingest processor of choice (for example ListFile) to run on "primary node" only. That way should the primary node change, any node can access the same source of data and continue ingesting data where the previous elected primary node left off.

Hope this addresses all your questions here.

Thanks,

Matt

If you find this answer addressed your questions, please take a moment to click "accept" below the answer to mark this thread complete.

avatar
New Contributor

Detailed information . Thank you

avatar
Rising Star

@Matt Clarke Thanks for your answer. If you can kindly aslo help with the below ones as well

1. While assigning salves and clients during HDF installation is 1 node enough for installing the Nifi Certificate authority ( considering we will have Nifi master service on 4 nodes)

2. How many zoo keeper client/ Infra Solr client are enough for a 3 node zookeeper cluster ( i.e zoo keeper master service on 3 nodes)? and is it okay to have client service running on the same node which is also hosting the zookeeper master service or should it be on a different server?

Thank you.

avatar
Super Mentor

@Faisal Durrani

1. There can be only one NiFi Certificate Authority. The Nifi CA was provided as a means to quickly and easily create certificates for securing a NiFi cluster for testing/evaluation purposes. We do not recommend using the Certificate authority in production environments. In production you should be using a corporately managed certificate authority to sign your servers certificates. The Certificate Authority (CA) is used to sign the certificates generated for every NiFi instance. The public key for the certificate authority is then placed in a truststore.jks file that is used on every NiFi instance while the keystore.jks contains a single PrivateKeyEntry unique to each NiFi host.

2. I am not a Solr guy, so I can not answer authoritatively there. If you have a 3 node ZK cluster setup, that should be fine to support your NiFI cluster. The ZK client is used to communicate with the ZK cluster. So ZK clients would need to be installed on any hosts that will communicate with the ZK cluster (this includes the ZK cluster servers themselves). NiFi does not need a ZK client installed because NiFi includes the ZK client lib inside of the NiFi application itself. It does not affect anything by installing an external ZK client on the same hosts.

Thanks,

Matt