Created 07-11-2016 04:14 PM
https://falcon.apache.org/EntitySpecification.html#Feed_Specification <cluster name="test-cluster"> <validity start="2012-07-20T03:00Z" end="2099-07-16T00:00Z"/> <retention limit="days(10)" action="delete"/> <sla slaLow="hours(3)" slaHigh="hours(4)"/> <locations> <location type="data" path="/hdfsDataLocation/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/> <location type="stats" path="/projects/falcon/clicksStats" /> <location type="meta" path="/projects/falcon/clicksMetaData" /> </locations> </cluster>
For location as data, do we have to specify the date format? Could I just have a general location as /hdfsDataLocation?
Created 07-11-2016 04:20 PM
if you have a general location you can only use time based scheduling ( every 15 min for example ) you cannot use retention or do late arrivals etc. All the advanced waiting for data to arrive features in oozie are essentially out.
Essentially mimicks the datasets in oozie:
https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset
Created 07-11-2016 04:14 PM
Without the date specification, the file will be overwritten with each replication, so you will have no history. If this is acceptable, simply specify the location as /hdfsDataLocation.
Created 07-11-2016 06:35 PM
I don't think it works. Tried with /hdfsDataLocation only, retention doesn't work.
Created 07-11-2016 04:20 PM
if you have a general location you can only use time based scheduling ( every 15 min for example ) you cannot use retention or do late arrivals etc. All the advanced waiting for data to arrive features in oozie are essentially out.
Essentially mimicks the datasets in oozie:
https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset