We are just beginning the journey with CDH and one of the things we're trying to figure out is the design a folder/directory structure for our incoming data. Is it better to segregate by type or source....or maybe application or product? And, would it be typical for the folder structure or hierarchy to keep raw data (just landed) separate from staged and transformed data? I assume it would might also be good to include a date hierarchy too maybe?
If anyone has suggestions or even a pointer to an article, I would be appreciate it.