Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impact of using large vs small aws instances for datanodes

Impact of using large vs small aws instances for datanodes

Explorer

Hello,

 

Would be there any impact (for ex, peformance impact on yarn resource allocation, impala queries performance) on running cluster - 

 

Fewer Large instances vs Many Smaller instances for datanodes?

 

For example, Instead of running cluster with 20 d2.2xlarge instances, 10 d2.4xlarge instances makes any difference? d2.4xlarge configuration is more or less equal to twice the d2.2xlarge config.

 

 

20 d2.2xlarge cost is $ 22106.40/Month

10 d2.4xlarge cost is $ 21154.80/Month

 

I can see cost savings here and ease of managing fewer no. of datanodes. Kindly let me your thoughts.

 

Thanks,

Mani

1 REPLY 1
Highlighted

Re: Impact of using large vs small aws instances for datanodes

Master Collaborator
More, smaller nodes can negatively impact performance, even if the total
amount of RAM and other resources remains the same. For one, there's more
overhead for more instances of the operating system, but there's also more
interaction and data transfer needed between the nodes for reduce
operations, joins, etc. How much it impacts you depends a lot on the
workload.