What are different measures and best practices for Capacity Planning my Hadoop cluster? We are planning to have large amounts of data coming in, so we would like to maintain best practices in capacity planning and hardware to grow the cluster. Please advice on this.
Be very wary of any hardware planning guide that's more than 3 or 4 months old. Hardware moves fast! 40 core machines with 768GB memory and 6,8, or 10 TB drives are here today. Many Hadoop clusters waste CPU & network by not buying large enough machines.