Support Questions

Find answers, ask questions, and share your expertise

What kind of speed/throughput should I be expecting to AWS S3?

avatar
Expert Contributor
 
1 ACCEPTED SOLUTION

avatar
Expert Contributor

Ok that helps, here is what I found related to this...hope it helps

Test: Copy 50 GB of data from a Hadoop cluster running on Amazon Elastic Compute Cloud (EC2) in Virginia to an Amazon S3 bucket in Oregon. NOTE the different between S3DistCp and DistCp, but your results may vary.

Method Data Size Copied Total Time

DistCp 50 GB 26 min

S3DistCp 50 GB 19 Min.

See here for more context https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

What kind of speed/throughput are you seeing? Are you just writing to an S3 bucket within an AWS instance? Note sure if you saw this post but this may help a bit http://www.rightscale.com/blog/cloud-industry-insights/network-performance-within-amazon-ec2-and-ama...

avatar
Expert Contributor

Ok that helps, here is what I found related to this...hope it helps

Test: Copy 50 GB of data from a Hadoop cluster running on Amazon Elastic Compute Cloud (EC2) in Virginia to an Amazon S3 bucket in Oregon. NOTE the different between S3DistCp and DistCp, but your results may vary.

Method Data Size Copied Total Time

DistCp 50 GB 26 min

S3DistCp 50 GB 19 Min.

See here for more context https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf

avatar
Expert Contributor

avatar

@Cassandra

Here are some of the results

S3 vs Native HDFS

Courtesy : Professional Hadoop Solutions.

avatar

There is a new S3 driver that Gopal has written that is supposed to be as fast as the driver from AWS. If you are implementing in the field, please reach out to him. It is supposed to be part of HDP 2.4 afaik 🙂