Support Questions

Find answers, ask questions, and share your expertise

Falcon Feed File System Storage format

avatar
Expert Contributor
https://falcon.apache.org/EntitySpecification.html#Feed_Specification

<cluster name="test-cluster">
            <validity start="2012-07-20T03:00Z" end="2099-07-16T00:00Z"/>
            <retention limit="days(10)" action="delete"/>
            <sla slaLow="hours(3)" slaHigh="hours(4)"/>
            <locations>
                <location type="data" path="/hdfsDataLocation/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
                <location type="stats" path="/projects/falcon/clicksStats" />
                <location type="meta" path="/projects/falcon/clicksMetaData" />
            </locations>
        </cluster>

For location as data, do we have to specify the date format? Could I just have a general location as /hdfsDataLocation?

1 ACCEPTED SOLUTION

avatar
Master Guru

if you have a general location you can only use time based scheduling ( every 15 min for example ) you cannot use retention or do late arrivals etc. All the advanced waiting for data to arrive features in oozie are essentially out.

Essentially mimicks the datasets in oozie:

https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset

View solution in original post

3 REPLIES 3

avatar
Rising Star

Without the date specification, the file will be overwritten with each replication, so you will have no history. If this is acceptable, simply specify the location as /hdfsDataLocation.

avatar
Expert Contributor

I don't think it works. Tried with /hdfsDataLocation only, retention doesn't work.

avatar
Master Guru

if you have a general location you can only use time based scheduling ( every 15 min for example ) you cannot use retention or do late arrivals etc. All the advanced waiting for data to arrive features in oozie are essentially out.

Essentially mimicks the datasets in oozie:

https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset