Member since
11-29-2016
6
Posts
0
Kudos Received
0
Solutions
11-29-2016
06:58 PM
I believe the only overall HDFS limit is determined by how much memory is available in the namenode.
... View more
11-29-2016
01:16 AM
Hello, the '90s are calling and they want their Ethernet back! Honestly, in this day and age it is patently ridiculous to use only 1g Ethernet to a server. Hortonworks should be ashamed that they still have that in their document. Even 10g is long-of-tooth - 25g is shipping and looking good; and 40G backbone links are moving to 100G.
... View more
11-29-2016
01:09 AM
I have heard this referred to as the "Swiss Cheese" method of server allocation, because you just put in servers wherever there are holes. But I hope you have a fast backbone - when Spark of Hadoop jobs that use map/reduce get to the shuffle phase all hell breaks out on the network. Then your net admins will beg you to use separate racks.
... View more
11-29-2016
12:54 AM
Be very wary of any hardware planning guide that's more than 3 or 4 months old. Hardware moves fast! 40 core machines with 768GB memory and 6,8, or 10 TB drives are here today. Many Hadoop clusters waste CPU & network by not buying large enough machines.
... View more
11-29-2016
12:48 AM
I'm not offering a direct answer, but a couple of things to think about. If your total data size is only 1.5TB, then it can all be kept in memory across a small number of nodes. Since its in memory, you won't need to use HDFS, you can use S3 or EBS. Use 8 nodes at 256G of memory each, then add vcores if you need more performance.
... View more
11-29-2016
12:36 AM
A couple of comments: 1. The section on setting up a dual homed network is correct, but misleading. Most people who set up dual-homed networks would expect to spread at least some of the load over the interfaces, but Hadoop code is just not network aware in that sense. So it is *much* better to use bonding/link aggregation for network redundancy. 2. In this day and age, don't even think about using 1Gb ports. Use at least 2 10Gb ports. Cloud providers are *today* installing 50Gb networks to their servers - 2x25Gb or 1x50Gb. You're wasting a LOT of CPU if you don't give them enough bandwidth.
... View more