Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Minimum hardware and clustering requirements for HDF 2.0

avatar
Expert Contributor

Hi ,

We are planning to start the implement ion of an IOT use case (might be 35000 vehicle signals per minute at this time with a small message size)

Could you please help me for the following questions ?

- Is physical servers recommended for HDF other than VM?

- How many minimum nodes needs to be deployed for the clustering?

- What is the minimum hardware requirements per each node?

Thanks

SJ

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Sanaz Janbakhsh

HDF 2.0 is based off Apache NiFi 1.0 which no longer has an NCM (NCM based clusters only exists in HDF 1.x or Apache NiFi 0.x versions). HDF 2.0 is a zero master cluster which requires Zookeeper (min 3 ZK nodes for Quorum) for cluster coordinator and primary node designations and for storing your cluster wide state.

- Is physical servers recommended for HDF other than VM? I do recommend physical servers over VM for NiFi. Depending on the dataflow(s) you design (which processor and controller service components you use), The load put on your server can go form very light to very heavy.

- How many minimum nodes needs to be deployed for the clustering? There is no minimum number of hosts in a NiFi cluster. You can actually even stand up a 1 node cluster (Pointless and actual will perform poorer the a standalone NiFi because of additional cluster overhead that is added). I suggest starting with a 3 node cluster to spread out your load and provide coverage if a node is lost. You can add additional node to an existing Nifi cluster later with minimal effort.

- What is the minimum hardware requirements per each node? Not knowing exactly what you plan on doing in your dataflow with regards to your 35,000 FlowFiles per minute, it is difficult to make any CPU suggestions. Generally speaking it is good practice to setup a POC and see how it scales. The fact that you are working with a large number of very small files, NiFi JVM heap usage could potentially be high, so making sure you have enough memory on each node to give NiFi at least 8 GB of heap to start with. You will need additional memory for the OS and any other services running on these host other then NiFi.

Thanks,

Matt

View solution in original post

7 REPLIES 7

avatar
Master Mentor

avatar
Master Mentor
@Sanaz Janbakhsh

Regarding your other queries. like

Regarding VM vs Physical server: VM based pros:

1. 'Easier' managing nodes. Some IT infrastructure teams insist on VMs even if you want to map 1 physical node to 1 virtual node because all their other infrastructure is based on VMs.

2. Taking advantage of NUMA and memory locality. There are some articles on this from virtual infrastructure providers that you can take a look at.

VM based disadvantages: (example may vary based on your usage and cluster)

1. Overhead. As an example, if you are running 4VMs per physical node, you are running 4 OS, 4 Datanode services, 4 Nodemanagers, 4 ambari-agents, 4 metrics collectors and 4 of any other worker services instead of one. These services will have overhead compared to running 1 of each.

2. Data Locality and redundancy. Now, there is support to know physical nodes, so no two replicas go into same physical node but that is extra configuration. You might run into virtual disk performance problems if they are not configured properly.

Given a choice, I prefer using Physical servers. However, its not always your choice. In those cases, make sure you try to get following.

1. Explicit virtual disk to physical disk mapping. Say you have 2 VMs per physical node and each physical node has 16 data drives. Make sure to split 8 drives to one VM and 8 more to another VM. This way, physical disks are not shared between VMs.

2. Don't go for more than 2 VMs per physical node. This is so you minimize overhead from the services running.

.

For a very basic cluster setup you can have simple two-node, non-secure, unicast cluster comprised of three instances of NiFi: The NCM, Node 1, Node 2 Please see: https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.2/bk_administration/content/clustering.html

avatar
Expert Contributor

Hi Jay SenSharma,

Thanks a lot for the usful links and information.

Just one more question : Does it mean that for the basic cluster set up i need to provision 3 severs (one master and two slaves) also Is NCM exist in HDF 2.0? I've read some where that it is not exist any more in the new verion.

Thanks,

SJ

avatar
Master Mentor

@Sanaz Janbakhsh

HDF 2.0 is based off Apache NiFi 1.0 which no longer has an NCM (NCM based clusters only exists in HDF 1.x or Apache NiFi 0.x versions). HDF 2.0 is a zero master cluster which requires Zookeeper (min 3 ZK nodes for Quorum) for cluster coordinator and primary node designations and for storing your cluster wide state.

- Is physical servers recommended for HDF other than VM? I do recommend physical servers over VM for NiFi. Depending on the dataflow(s) you design (which processor and controller service components you use), The load put on your server can go form very light to very heavy.

- How many minimum nodes needs to be deployed for the clustering? There is no minimum number of hosts in a NiFi cluster. You can actually even stand up a 1 node cluster (Pointless and actual will perform poorer the a standalone NiFi because of additional cluster overhead that is added). I suggest starting with a 3 node cluster to spread out your load and provide coverage if a node is lost. You can add additional node to an existing Nifi cluster later with minimal effort.

- What is the minimum hardware requirements per each node? Not knowing exactly what you plan on doing in your dataflow with regards to your 35,000 FlowFiles per minute, it is difficult to make any CPU suggestions. Generally speaking it is good practice to setup a POC and see how it scales. The fact that you are working with a large number of very small files, NiFi JVM heap usage could potentially be high, so making sure you have enough memory on each node to give NiFi at least 8 GB of heap to start with. You will need additional memory for the OS and any other services running on these host other then NiFi.

Thanks,

Matt

avatar
Expert Contributor

Hi Matt,

Thanks. so for the 3 nodes that you recommend since NCM has no longer exist , do we still have one master and 2 salve nodes?

SJ

avatar
Master Mentor

@Sanaz Janbakhsh

It is "zero master clustering". All nodes in an HDF 2.0 (NiFi 1.x) cluster run the dataflow and do work on FlowFiles. An election is conducted and at completion of that election one node will be elected as the cluster coordinator and one node will be elected as the primary node (run primary node only configured processors). Which node in the cluster is assigned these roles can change at anytime should the previously elected node should stop sending heartbeats in the configured threshold. It also possible for same node to be elected both roles.

This also means that any node in a HDF 2.0 cluster can be used for establishing Site-to-Site (S2S) connections. Ind old NiFi S2S to a cluster required that the RPG point at the NCM.

Thanks,

Matt

avatar
Master Mentor

@Sanaz Janbakhsh

If you found the information provided useful, please accept that answer.

Thank you,

Matt