Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS Directory Structure Best Practices

Solved Go to solution

HDFS Directory Structure Best Practices

Explorer

Hi-

   Can someone point me to a good resource for "best practices" for a hadoop directory structure for storing raw data, intermediate files, output files, metadata etc in HDFS?   Do you segregate different data types into different directory structures?   Are the directory structures labeled per YYMMDD?  What would a typical HDFS directory structure look like when setting up to store data? 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HDFS Directory Structure Best Practices

Rising Star
Eric Sammer (author of Hadoop Operations) has written a great answer about the same here:

https://www.quora.com/What-is-the-best-directory-structure-to-store-different-types-of-logs-in-HDFS-...

Hadoop Operations is a great book and has quite a few good tricks.

View solution in original post

2 REPLIES 2
Highlighted

Re: HDFS Directory Structure Best Practices

Rising Star
Eric Sammer (author of Hadoop Operations) has written a great answer about the same here:

https://www.quora.com/What-is-the-best-directory-structure-to-store-different-types-of-logs-in-HDFS-...

Hadoop Operations is a great book and has quite a few good tricks.

View solution in original post

Highlighted

Re: HDFS Directory Structure Best Practices

New Contributor

It could be depends on data layers in your HDFS directory, for instance, if you have raw and standard layer this would be one of the practices. 

 

Raw is the first landing of data and need to be as close to the original data as possible. Standard is the staging of the data where it converted into different data formats and still no semantic changed have been done to data.

 

the structure for raw data and meta is : 

                raw/businessarea/sourcesystem/data/date&time

               raw/businessarea/sourcesystem/meta/date&time

the structure of standard data/meta folder is: standard/businessarea/sourcesystem/data/date&time

                                                                        standard/businessarea/sourcesystem/meta/date&time

 

these standards can also help to make  sentry/ranger policies based AD groups

 

Don't have an account?
Coming from Hortonworks? Activate your account here