Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are the cluster-wide bandwidth limitations on S3?

avatar
Contributor

I cannot get any more than about 105 MB/sec into or out of S3 on an AWS cluster with 8 big nodes. (S3 is in the same region.) It takes many parallel requests to get even this much--each individual mapper seems to cap out at barely more than single digit MBPS. It seems to be a limit on the VPC, but there must be a way to increase this, as I read about people processing petabytes of s3 data. Can anyone offer any suggestions on how to configure for reasonable S3 performance?

Also, can anyone shed light on how the networking works underneath? Is the S3 traffic coming over the same 10Gb LAN that the instances are using? Does EBS traffice go over the LAN too? Total EBS traffic seems to be in practice limited to about 350MB/sec across the cluster.

Does EBS use the same LAN as inter-node traffic? If so, it would seem to be impossible that you could ever exceed a total of 1.25 MB/sec of disk I/O for everything. That can't be right, given the size of clusters I hear about. What's going on?

1 ACCEPTED SOLUTION

avatar

@Peter Coates

I think the AWS doc is pretty clear on this: "EBS–optimized instances deliver dedicated bandwidth to Amazon EBS". How they do that is unclear - but it is safe to say that creating an overlay network (i.e. VPC, EBS-Optimized, etc.) would require the use of SDN technology. That being said, there is a pretty descriptive chart on expected bandwidth, throughput, and IOPS for Amazon EBS-Optimized Instances here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html.

Regarding your observations on S3 performance, you are paying a lot less for S3 - so most of this is to be expected. However - the link I shared with you previously covers the work we are driving in the community around Hadoop-AWS Integration. This work deals with the differences between how HDFS needs to be optimized to interact with Blob Storage API's (such as S3A) going forward as opposed to how it historically interacted with Direct Attached Storage.

View solution in original post

6 REPLIES 6

avatar
@Peter Coates

Because of the nature of the S3 object store, data written to an S3A OutputStream is not written incrementally —instead, by default, it is buffered to disk until the stream is closed in its close() method. This can make output slow: the further the process is from the S3 endpoint, or the smaller the EC-hosted VM is, the longer it will take work to complete.

Work to address this began in Hadoop 2.7 with the S3AFastOutputStreamHADOOP-11183, and has continued with S3ABlockOutputStreamHADOOP-13560. With incremental writes of blocks, "S3A fast upload" offers an upload time at least as fast as the "classic" mechanism, with significant benefits on long-lived output streams, and when very large amounts of data are generated.

Please see Hadoop-AWS module: Integration with Amazon Web Services for more information on Hadoop-AWS integration.

On instances without support for EBS-optimized throughput, network traffic can contend with traffic between your instance and your EBS volumes. EBS–optimized instances deliver dedicated bandwidth to Amazon EBS, with options between 500 Mbps and 10,000 Mbps, depending on the instance type you use. When you enable EBS optimization for an instance that is not EBS–optimized by default, you pay an additional low, hourly fee for the dedicated capacity.

avatar
Contributor

Tom--thanks for the reply. With regard to the EBS, yes-all the nodes in question are EBS optimized. The critical question is, do ALL those optimized instances share a single 10Gb network to the EBS hosts?

As near as I can tell empirically, as the number of writers or readers increases, the total throughput approaches a limit of about 1/3 of a 10Gb network asymptotically. No matter how many readers or writers, that's the limit.

The docs are clear that "optimized" puts a node's EBS on a different LAN but do not seem to indicate whether that alternate network is also 10Gb/sec. Or possibly there is some fancier topology with multiple alternate LANs, etc.

The same general behavior happens with S3, but with the limit about 105MB/sec. instead of 350MB/sec. It takes about 8 or 10 threads to hit the limit. Individual readers never go over about 12MB/sec.

My numbers are consistent with benchmarks that I've found online, including the asymptotic thing. But I don't think that can really be the limit--Netflix claims they query a petabyte of S3 daily. It would have to be a very long day!

avatar

@Peter Coates

I think the AWS doc is pretty clear on this: "EBS–optimized instances deliver dedicated bandwidth to Amazon EBS". How they do that is unclear - but it is safe to say that creating an overlay network (i.e. VPC, EBS-Optimized, etc.) would require the use of SDN technology. That being said, there is a pretty descriptive chart on expected bandwidth, throughput, and IOPS for Amazon EBS-Optimized Instances here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html.

Regarding your observations on S3 performance, you are paying a lot less for S3 - so most of this is to be expected. However - the link I shared with you previously covers the work we are driving in the community around Hadoop-AWS Integration. This work deals with the differences between how HDFS needs to be optimized to interact with Blob Storage API's (such as S3A) going forward as opposed to how it historically interacted with Direct Attached Storage.

avatar

@Peter Coates

As always, it is good to hear from you.

In lieu of answering all of your questions outright (since several of them deal with Amazon Proprietary IP) ... If I helped progress your research here, I'd be very appreciative if you could Accept my answer.

Thanks. Tom

avatar
Contributor

Tom--yes, that all sounds right to me. Great answers. It's quite remarkable that they can do this when you consider the implications. It's easy to see how it's done on the network, as they can allocated bandwidth with a fixed usage guarantee and a known capacity on the link. What is mystifying is how it is possible for the physical infrastructure of the volumes to support this, as the the amount of work they do is highly variable depending on specific workload. Whatever is behind the volumes is clearly not your grandpa's NAS, as it can flatten out huge random demand peaks from many users.

avatar

Buckets are somehow sharded: the more load you put on the same bucket, the less bandwidth you apparently get from each. This sharding is based on the filename: the more diverse your filenames, the better the bandwidth is likely to be. This is something listed in the AWS docs, but they are deliberately vague as to what happens, so that they have the freedom to change sharding policy without complaints about backwards compatibility.

You are also going to get different bandwidth numbers depending on the network capacity of your VM, and faster read rate than write rate.

When netflix talk about their performance, assume multiple buckets and many, many, readers