Member since
02-24-2014
5
Posts
0
Kudos Received
0
Solutions
04-07-2014
08:26 AM
So one more question I had. Is it purely a non-functional performance consideration based on workloads? Is it ever a concern that any of the software components in the Cloudera stack would actually cause job failures (or even worse successful completions by creating a corrupt dataset) through mixing say bonded 1GE and 10GE racks of servers? We're running HBase, MapReduce and very light impala on our cluster of over 60 nodes, and we're thinking of moving to 10GE for nodes 60 - 100. But we're not sure if we should also upgrade the existing 60 nodes. We'll do some investigation now to determine whether our jobs are network bound. But there doesn't seem to be an easy way of measuring other than through the Chart views and looking at total bytes received on all interfaces across time across each node. Any other suggestions? Would anyone recommend that in order to move to 10GE networking that all potential components of the solution MUST be upgraded? Or is it purely a call to be made based on the performance attributes of jobs running?
... View more
04-06-2014
12:37 AM
Thanks, I've been managing 30-40 node clusters in my team (soon >60), and we've found that the CPU and memory differences are easy to manage through configurations for mapreduce. But newer generations of hardware are about to hit the market soon, SSDs are going to get cheaper and 10GE networking is not almost a standard in most datacentres. So should we upgrade our existing nodes to 10GE? Or should we keep everything mixed and allow the software to handle it?
... View more
04-02-2014
04:18 AM
Hi, I have a question to pose around the evolution of hardware technologies, hadoop, datacentre management. The factors in play are: License costs associated with number of nodes (Cloudera, monitoring tools, OS, etc.) Cost and commoditising of hardware (buying what is most bang for buck at the time) Datacentre and maintenance costs (reduction of costs through standardising on hardware) Functionally acceptable and working configruation for your usecase When planning your hardware strategy and there are always new technologies such as: 1GE vs 10GE network More CPU cores Energy efficiency RAM and Disk gets cheaper How do you make decisions around hardware lifecycle management? I'd like to hear about your pros and cons around whether you would try to keep upgrading all your infrastructure to keep everything at the same spec or whether you would have wildly varying infrastructure specs and whether there is any functional impact on your cluster.
... View more
Labels:
- Labels:
-
Apache Hadoop