Created 06-15-2016 09:10 AM
Hi All,
We have six physical machines. Which way of cluster setup is good? Having those physical Machines as it is (or) Create multiple VMs on top of those machines and create a big cluster?
Those machines are highly available machines with more than 450 GB of RAM.
Please Suggest me!
Created 06-15-2016 01:23 PM
There are pros/cons for both. VMs have a negative impact on performance so we would normally go for bare metal. Mapreduce is good in scaling to lots of discs/processes even on a single data node.
However there are limits on VERY big nodes ( there are new Apollo servers with 24 drives ) you want to increase the HDFS DataNode memory and you may have issues with very big block reports being sent around. In that case logically splitting a node into multiple smaller VMs might solve these issues.
But normally I would say go bare metal.
Created 06-15-2016 11:46 AM
Hi @Uday Vakalapudi Typically you will always be better off with multiple machines (scale out) rather than a smaller number of large machines (scale up).
If you consider the way that Hadoop works, jobs are effectively distributed across the whole cluster and all the resources can be utilised simultaneously. This is the opposite of what virtualisation typically handles, which is multiple machines with different workloads and different workload profiles (I/O, cpu, memory).
My short suggestion would be if you're just looking at a test/dev/pilot system, then multiple VM's is fine. But for production, consider scale out on bare metal.
Hope that helps.
,Typically you will always be better off with multiple machines (scale out) rather than a smaller number of large machines (scale up).
If you consider the way that Hadoop works, jobs are effectively distributed across the whole cluster and all the resources can be utilised simultaneously. This is the opposite of what virtualisation typically handles, which is multiple machines with different workloads and different workload profiles (I/O, cpu, memory).
My short suggestion would be if you're just looking at a test/dev/pilot system, then multiple VM's is fine. But for production, consider scale out on bare metal.
Created 06-15-2016 01:23 PM
There are pros/cons for both. VMs have a negative impact on performance so we would normally go for bare metal. Mapreduce is good in scaling to lots of discs/processes even on a single data node.
However there are limits on VERY big nodes ( there are new Apollo servers with 24 drives ) you want to increase the HDFS DataNode memory and you may have issues with very big block reports being sent around. In that case logically splitting a node into multiple smaller VMs might solve these issues.
But normally I would say go bare metal.