- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
datasets and output input events : what correlation between YEAR/MONTH/DAY and the instance ?
- Labels:
-
Apache Oozie
Created 10-04-2017 04:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe it is obvious but I was wondering :
When we declare a dataset, based on the date ($YEAR/$MONTH/$DAY/data for example) as an output-events, and used from an input-events where "instance" will watch at current(0) :
Does the dated directory name is directly used to check the input event, or is there a kind of database that register that inside Oozie ? In other words, if we don't mention the output-events and create the "good" directory, will it still working ?
Created 10-07-2017 03:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yess it will
1. Generally at ingestion stage data is collected at minute, hourly or daily level.
2. To keep data together based on timestamp, one follow "hdfs path" naming convention as /a/b/b/yyyy/mm/dd
3. the job which consumes this data for performing ETL , needs to choose a range of this path like a week , or a month etc hence datasets have YYYY/MM/DD as the variable param in them .
Created 10-07-2017 03:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yess it will
1. Generally at ingestion stage data is collected at minute, hourly or daily level.
2. To keep data together based on timestamp, one follow "hdfs path" naming convention as /a/b/b/yyyy/mm/dd
3. the job which consumes this data for performing ETL , needs to choose a range of this path like a week , or a month etc hence datasets have YYYY/MM/DD as the variable param in them .
Created 10-09-2017 07:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks 🙂