Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Falcon Feed File System Storage format

avatar
Expert Contributor
https://falcon.apache.org/EntitySpecification.html#Feed_Specification

<cluster name="test-cluster">
            <validity start="2012-07-20T03:00Z" end="2099-07-16T00:00Z"/>
            <retention limit="days(10)" action="delete"/>
            <sla slaLow="hours(3)" slaHigh="hours(4)"/>
            <locations>
                <location type="data" path="/hdfsDataLocation/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
                <location type="stats" path="/projects/falcon/clicksStats" />
                <location type="meta" path="/projects/falcon/clicksMetaData" />
            </locations>
        </cluster>

For location as data, do we have to specify the date format? Could I just have a general location as /hdfsDataLocation?

1 ACCEPTED SOLUTION

avatar
Master Guru

if you have a general location you can only use time based scheduling ( every 15 min for example ) you cannot use retention or do late arrivals etc. All the advanced waiting for data to arrive features in oozie are essentially out.

Essentially mimicks the datasets in oozie:

https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset

View solution in original post

3 REPLIES 3

avatar
Rising Star

Without the date specification, the file will be overwritten with each replication, so you will have no history. If this is acceptable, simply specify the location as /hdfsDataLocation.

avatar
Expert Contributor

I don't think it works. Tried with /hdfsDataLocation only, retention doesn't work.

avatar
Master Guru

if you have a general location you can only use time based scheduling ( every 15 min for example ) you cannot use retention or do late arrivals etc. All the advanced waiting for data to arrive features in oozie are essentially out.

Essentially mimicks the datasets in oozie:

https://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html#a5._Dataset