New Contributor
Posts: 1
Registered: ‎08-08-2017

S3 Storage - Best Practices

Our team is working on setting up a data lake in our hadoop cluster.  Our Data lake will be forked into 2 storage layers.  All raw data / cold data will be saved in S3, while the transformed/hot data will be saved in EBS/HDFS.  The plan is to minimize the storage costs on EBS on "cold" data that will not be used frequently.


I am looking for some best practices/guidances on the structures of Storage buckets and the objects in the bucket.  Are there are any "do's / don'ts" around these?  Is it a good practice to have a handful of Storage buckets and organize all of them inside these or is it better to have a large number of buckets ?


All feedbacks appreciated. !!!