Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Which way of HDP cluster setup is best, having physical nodes or having multiple VMs with few physical nodes

Solved Go to solution

Which way of HDP cluster setup is best, having physical nodes or having multiple VMs with few physical nodes

Hi All,

We have six physical machines. Which way of cluster setup is good? Having those physical Machines as it is (or) Create multiple VMs on top of those machines and create a big cluster?

Those machines are highly available machines with more than 450 GB of RAM.

Please Suggest me!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Which way of HDP cluster setup is best, having physical nodes or having multiple VMs with few physical nodes

There are pros/cons for both. VMs have a negative impact on performance so we would normally go for bare metal. Mapreduce is good in scaling to lots of discs/processes even on a single data node.

However there are limits on VERY big nodes ( there are new Apollo servers with 24 drives ) you want to increase the HDFS DataNode memory and you may have issues with very big block reports being sent around. In that case logically splitting a node into multiple smaller VMs might solve these issues.

But normally I would say go bare metal.

View solution in original post

2 REPLIES 2
Highlighted

Re: Which way of HDP cluster setup is best, having physical nodes or having multiple VMs with few physical nodes

Hi @Uday Vakalapudi Typically you will always be better off with multiple machines (scale out) rather than a smaller number of large machines (scale up).

If you consider the way that Hadoop works, jobs are effectively distributed across the whole cluster and all the resources can be utilised simultaneously. This is the opposite of what virtualisation typically handles, which is multiple machines with different workloads and different workload profiles (I/O, cpu, memory).

My short suggestion would be if you're just looking at a test/dev/pilot system, then multiple VM's is fine. But for production, consider scale out on bare metal.

Hope that helps.

,

Typically you will always be better off with multiple machines (scale out) rather than a smaller number of large machines (scale up).

If you consider the way that Hadoop works, jobs are effectively distributed across the whole cluster and all the resources can be utilised simultaneously. This is the opposite of what virtualisation typically handles, which is multiple machines with different workloads and different workload profiles (I/O, cpu, memory).

My short suggestion would be if you're just looking at a test/dev/pilot system, then multiple VM's is fine. But for production, consider scale out on bare metal.

Highlighted

Re: Which way of HDP cluster setup is best, having physical nodes or having multiple VMs with few physical nodes

There are pros/cons for both. VMs have a negative impact on performance so we would normally go for bare metal. Mapreduce is good in scaling to lots of discs/processes even on a single data node.

However there are limits on VERY big nodes ( there are new Apollo servers with 24 drives ) you want to increase the HDFS DataNode memory and you may have issues with very big block reports being sent around. In that case logically splitting a node into multiple smaller VMs might solve these issues.

But normally I would say go bare metal.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here