Created on 04-02-2014 04:18 AM - edited 09-16-2022 01:56 AM
Hi,
I have a question to pose around the evolution of hardware technologies, hadoop, datacentre management.
The factors in play are:
When planning your hardware strategy and there are always new technologies such as:
How do you make decisions around hardware lifecycle management?
I'd like to hear about your pros and cons around whether you would try to keep upgrading all your infrastructure to keep everything at the same spec or whether you would have wildly varying infrastructure specs and whether there is any functional impact on your cluster.
Created 04-07-2014 07:47 AM
@kiyengar Whether you should upgrade your infrastructure or not will be a decision you will have to make based on the performance details of your cluster as it stands today and compared to what you expect your load to increase to in the near- to mid-term future.
While we can't provide you a one-size-fits-all answer for that, I CAN tell you that many of the clusters we see in production now do use 10GbE networking. In fact, the reference architectures from our partners Dell and Oracle go well beyond that and actually utillize dual bonded 10GbE and/or bonded Infiniband network configurations for their hadoop cluster offerings.
The answer to your question will really come down to "are you seeing, or will you soon see network saturation in your cluster?" If your cluster is handling the workloads you are throwing at it just fine, than there is no need to upgrade. If your network is becoming, or will soon become a serious bottleneck in your throughput, than definitely look to upgrade. Make sure to work closely with your hardware vendor to assure the latest 10GbE hardware/firmware you seek to use is bug free, however, as we've seen many issues with early generation firmware on some network adapters.
Regarding SSDs, make sure you fully understand all the nuances of using and configuring them correctly before you head down that path. This blog is a very helpful place to start.
Created 04-02-2014 09:36 AM
Created 04-06-2014 12:37 AM
Thanks,
I've been managing 30-40 node clusters in my team (soon >60), and we've found that the CPU and memory differences are easy to manage through configurations for mapreduce.
But newer generations of hardware are about to hit the market soon, SSDs are going to get cheaper and 10GE networking is not almost a standard in most datacentres.
So should we upgrade our existing nodes to 10GE? Or should we keep everything mixed and allow the software to handle it?
Created 04-07-2014 07:47 AM
@kiyengar Whether you should upgrade your infrastructure or not will be a decision you will have to make based on the performance details of your cluster as it stands today and compared to what you expect your load to increase to in the near- to mid-term future.
While we can't provide you a one-size-fits-all answer for that, I CAN tell you that many of the clusters we see in production now do use 10GbE networking. In fact, the reference architectures from our partners Dell and Oracle go well beyond that and actually utillize dual bonded 10GbE and/or bonded Infiniband network configurations for their hadoop cluster offerings.
The answer to your question will really come down to "are you seeing, or will you soon see network saturation in your cluster?" If your cluster is handling the workloads you are throwing at it just fine, than there is no need to upgrade. If your network is becoming, or will soon become a serious bottleneck in your throughput, than definitely look to upgrade. Make sure to work closely with your hardware vendor to assure the latest 10GbE hardware/firmware you seek to use is bug free, however, as we've seen many issues with early generation firmware on some network adapters.
Regarding SSDs, make sure you fully understand all the nuances of using and configuring them correctly before you head down that path. This blog is a very helpful place to start.
Created 04-07-2014 08:26 AM
So one more question I had.
Is it purely a non-functional performance consideration based on workloads?
Is it ever a concern that any of the software components in the Cloudera stack would actually cause job failures (or even worse successful completions by creating a corrupt dataset) through mixing say bonded 1GE and 10GE racks of servers? We're running HBase, MapReduce and very light impala on our cluster of over 60 nodes, and we're thinking of moving to 10GE for nodes 60 - 100. But we're not sure if we should also upgrade the existing 60 nodes.
We'll do some investigation now to determine whether our jobs are network bound. But there doesn't seem to be an easy way of measuring other than through the Chart views and looking at total bytes received on all interfaces across time across each node. Any other suggestions?
Would anyone recommend that in order to move to 10GE networking that all potential components of the solution MUST be upgraded? Or is it purely a call to be made based on the performance attributes of jobs running?