I want to build a cluster in which everyday data input is 2 TB, which i want to process on daily basis. how can i decide howmany worker nodes we should have in a cluster and what hardware configuration for each node ?
Suppose i have a server having 128 GB RAM and 32 cores. I am creating 4 VMs on this server each having 32 GB RAM and 8 cores. Will each VM works as worker node ? does it gives parallel data processing performance ?
what will be the performance difference between 4 VMs on single server and 4 servers having same capacity(32 GB RAM and 8 cores) ?