- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS Directory Structure Best Practices
- Labels:
-
HDFS
Created on ‎01-17-2017 12:46 PM - edited ‎09-16-2022 03:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi-
Can someone point me to a good resource for "best practices" for a hadoop directory structure for storing raw data, intermediate files, output files, metadata etc in HDFS? Do you segregate different data types into different directory structures? Are the directory structures labeled per YYMMDD? What would a typical HDFS directory structure look like when setting up to store data?
Created ‎01-17-2017 01:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://www.quora.com/What-is-the-best-directory-structure-to-store-different-types-of-logs-in-HDFS-...
Hadoop Operations is a great book and has quite a few good tricks.
Created ‎01-17-2017 01:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://www.quora.com/What-is-the-best-directory-structure-to-store-different-types-of-logs-in-HDFS-...
Hadoop Operations is a great book and has quite a few good tricks.
Created ‎01-25-2019 04:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It could be depends on data layers in your HDFS directory, for instance, if you have raw and standard layer this would be one of the practices.
Raw is the first landing of data and need to be as close to the original data as possible. Standard is the staging of the data where it converted into different data formats and still no semantic changed have been done to data.
the structure for raw data and meta is :
raw/businessarea/sourcesystem/data/date&time
raw/businessarea/sourcesystem/meta/date&time
the structure of standard data/meta folder is: standard/businessarea/sourcesystem/data/date&time
standard/businessarea/sourcesystem/meta/date&time
these standards can also help to make sentry/ranger policies based AD groups
