08-08-2017 07:47 PM
Our team is working on setting up a data lake in our hadoop cluster. Our Data lake will be forked into 2 storage layers. All raw data / cold data will be saved in S3, while the transformed/hot data will be saved in EBS/HDFS. The plan is to minimize the storage costs on EBS on "cold" data that will not be used frequently.
I am looking for some best practices/guidances on the structures of Storage buckets and the objects in the bucket. Are there are any "do's / don'ts" around these? Is it a good practice to have a handful of Storage buckets and organize all of them inside these or is it better to have a large number of buckets ?
All feedbacks appreciated. !!!