Created 10-05-2015 07:45 PM
Created 10-05-2015 08:16 PM
Ok that helps, here is what I found related to this...hope it helps
Test: Copy 50 GB of data from a Hadoop cluster running on Amazon Elastic Compute Cloud (EC2) in Virginia to an Amazon S3 bucket in Oregon. NOTE the different between S3DistCp and DistCp, but your results may vary.
Method Data Size Copied Total Time
DistCp 50 GB 26 min
S3DistCp 50 GB 19 Min.
See here for more context https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
Created 10-05-2015 07:49 PM
What kind of speed/throughput are you seeing? Are you just writing to an S3 bucket within an AWS instance? Note sure if you saw this post but this may help a bit http://www.rightscale.com/blog/cloud-industry-insights/network-performance-within-amazon-ec2-and-ama...
Created 10-05-2015 08:16 PM
Ok that helps, here is what I found related to this...hope it helps
Test: Copy 50 GB of data from a Hadoop cluster running on Amazon Elastic Compute Cloud (EC2) in Virginia to an Amazon S3 bucket in Oregon. NOTE the different between S3DistCp and DistCp, but your results may vary.
Method Data Size Copied Total Time
DistCp 50 GB 26 min
S3DistCp 50 GB 19 Min.
See here for more context https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
Created 10-05-2015 08:18 PM
thanks @drice@hortonworks.com
Created 10-08-2015 11:37 AM
Created 11-17-2015 05:26 PM
There is a new S3 driver that Gopal has written that is supposed to be as fast as the driver from AWS. If you are implementing in the field, please reach out to him. It is supposed to be part of HDP 2.4 afaik 🙂