Created 03-23-2017 01:53 AM
Hi ,
We are planning to start the implement ion of an IOT use case (might be 35000 vehicle signals per minute at this time with a small message size)
Could you please help me for the following questions ?
- Is physical servers recommended for HDF other than VM?
- How many minimum nodes needs to be deployed for the clustering?
- What is the minimum hardware requirements per each node?
Thanks
SJ
Created 03-23-2017 02:02 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 03-23-2017 02:06 AM
Some of the S/W and H/W requirements can be found in the following link: https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.2/bk_ambari-installation/content/system-requi...
Created 03-23-2017 02:16 AM
Regarding your other queries. like
Regarding VM vs Physical server: VM based pros:
1. 'Easier' managing nodes. Some IT infrastructure teams insist on VMs even if you want to map 1 physical node to 1 virtual node because all their other infrastructure is based on VMs.
2. Taking advantage of NUMA and memory locality. There are some articles on this from virtual infrastructure providers that you can take a look at.
VM based disadvantages: (example may vary based on your usage and cluster)
1. Overhead. As an example, if you are running 4VMs per physical node, you are running 4 OS, 4 Datanode services, 4 Nodemanagers, 4 ambari-agents, 4 metrics collectors and 4 of any other worker services instead of one. These services will have overhead compared to running 1 of each.
2. Data Locality and redundancy. Now, there is support to know physical nodes, so no two replicas go into same physical node but that is extra configuration. You might run into virtual disk performance problems if they are not configured properly.
Given a choice, I prefer using Physical servers. However, its not always your choice. In those cases, make sure you try to get following.
1. Explicit virtual disk to physical disk mapping. Say you have 2 VMs per physical node and each physical node has 16 data drives. Make sure to split 8 drives to one VM and 8 more to another VM. This way, physical disks are not shared between VMs.
2. Don't go for more than 2 VMs per physical node. This is so you minimize overhead from the services running.
.
For a very basic cluster setup you can have simple two-node, non-secure, unicast cluster comprised of three instances of NiFi: The NCM, Node 1, Node 2 Please see: https://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.0.2/bk_administration/content/clustering.html
Created 03-23-2017 02:50 AM
Hi Jay SenSharma,
Thanks a lot for the usful links and information.
Just one more question : Does it mean that for the basic cluster set up i need to provision 3 severs (one master and two slaves) also Is NCM exist in HDF 2.0? I've read some where that it is not exist any more in the new verion.
Thanks,
SJ
Created 03-23-2017 02:02 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 03-23-2017 02:31 PM
Hi Matt,
Thanks. so for the 3 nodes that you recommend since NCM has no longer exist , do we still have one master and 2 salve nodes?
SJ
Created 03-23-2017 02:44 PM
It is "zero master clustering". All nodes in an HDF 2.0 (NiFi 1.x) cluster run the dataflow and do work on FlowFiles. An election is conducted and at completion of that election one node will be elected as the cluster coordinator and one node will be elected as the primary node (run primary node only configured processors). Which node in the cluster is assigned these roles can change at anytime should the previously elected node should stop sending heartbeats in the configured threshold. It also possible for same node to be elected both roles.
This also means that any node in a HDF 2.0 cluster can be used for establishing Site-to-Site (S2S) connections. Ind old NiFi S2S to a cluster required that the RPG point at the NCM.
Thanks,
Matt
Created 03-23-2017 02:48 PM
If you found the information provided useful, please accept that answer.
Thank you,
Matt