Support Questions
Find answers, ask questions, and share your expertise

datasets and output input events : what correlation between YEAR/MONTH/DAY and the instance ?

Solved Go to solution

datasets and output input events : what correlation between YEAR/MONTH/DAY and the instance ?

Maybe it is obvious but I was wondering :

When we declare a dataset, based on the date ($YEAR/$MONTH/$DAY/data for example) as an output-events, and used from an input-events where "instance" will watch at current(0) :

Does the dated directory name is directly used to check the input event, or is there a kind of database that register that inside Oozie ? In other words, if we don't mention the output-events and create the "good" directory, will it still working ?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: datasets and output input events : what correlation between YEAR/MONTH/DAY and the instance ?

Yess it will

1. Generally at ingestion stage data is collected at minute, hourly or daily level.

2. To keep data together based on timestamp, one follow "hdfs path" naming convention as /a/b/b/yyyy/mm/dd

3. the job which consumes this data for performing ETL , needs to choose a range of this path like a week , or a month etc hence datasets have YYYY/MM/DD as the variable param in them .

View solution in original post

2 REPLIES 2

Re: datasets and output input events : what correlation between YEAR/MONTH/DAY and the instance ?

Yess it will

1. Generally at ingestion stage data is collected at minute, hourly or daily level.

2. To keep data together based on timestamp, one follow "hdfs path" naming convention as /a/b/b/yyyy/mm/dd

3. the job which consumes this data for performing ETL , needs to choose a range of this path like a week , or a month etc hence datasets have YYYY/MM/DD as the variable param in them .

View solution in original post

Re: datasets and output input events : what correlation between YEAR/MONTH/DAY and the instance ?

thanks :-)