Support Questions

Find answers, ask questions, and share your expertise

How to trigger oozie workflow when input data with dynamic name is available?

avatar
New Contributor

I am doing a oozie-cooridnator when input data with dynamic name is available. Here is the coordinator.xml:

 <coordinator-app name="${jobName} Coordinator" frequency="${coord:days(1)}" start="${startTime}" end="2099-01-01T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
   <datasets>
    <dataset name="gaSchema" frequency="30" initial-instance="${startTime}" timezone="UTC">
      <uri-template>${nameNode}/ga/bySchema/</uri-template>
      <done-flag>ga_${YEAR}${MONTH}${DAY}.avro</done-flag>
    </dataset>
  </datasets>
  <input-events>
      <data-in name="coordInput1" dataset="gaSchema">
          <start-instance>${coord:current(-23)}</start-instance>
          <end-instance>${coord:current(0)}</end-instance>
      </data-in>
   </input-events>
   <action>
      <workflow>
         <app-path>${wfApplicationPath}</app-path>
         <configuration>
            <property><name>date</name><value>${coord:formatTime(coord:nominalTime(), "yyyyMMdd")}</value></property>
            <property><name>jobTracker</name><value>${jobTracker}</value></property>
            <property><name>nameNode</name><value>${nameNode}</value></property>
            <property><name>jobName</name><value>${jobName}</value></property>  
          </configuration>
      </workflow>
   </action>
</coordinator-app>

When the file with current date arrives a hdfs folder then trigger workflow.

  <done-flag>ga_${YEAR}${MONTH}${DAY}.avro</done-flag>

It didn't work with dynamic name. I search it on internet, it seems it works on dynamic folder with fixed file name. for example:

 <uri-template>${nameNode}/ga/bySchema/${YEAR}${MONTH}${DAY}</uri-template>
  <done-flag>ga.avro</done-flag>

In this case, I have to create a lot of folders on hdfs because we import data every day.

Do you have any ideas how to trigger oozie workflow when input data with dynamic name is available?

Thanks

1 ACCEPTED SOLUTION

avatar
New Contributor

I find a workaround solution. When a file is dropped in the folder /ga/bySchema/, a _SUCCESS file is created in that folder, <done-flag>_SUCCESS</done-flag> , then trigger workflow, in workflow.xml, I move the file (ga_${today}.avro) into an archive folder and delete _SUCCESS file. ${today} is defined in coordinator.xml. now working fine.

View solution in original post

1 REPLY 1

avatar
New Contributor

I find a workaround solution. When a file is dropped in the folder /ga/bySchema/, a _SUCCESS file is created in that folder, <done-flag>_SUCCESS</done-flag> , then trigger workflow, in workflow.xml, I move the file (ga_${today}.avro) into an archive folder and delete _SUCCESS file. ${today} is defined in coordinator.xml. now working fine.