- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to trigger oozie workflow when input data with dynamic name is available?
- Labels:
-
Apache Oozie
Created ‎09-27-2016 09:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am doing a oozie-cooridnator when input data with dynamic name is available. Here is the coordinator.xml:
<coordinator-app name="${jobName} Coordinator" frequency="${coord:days(1)}" start="${startTime}" end="2099-01-01T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1"> <datasets> <dataset name="gaSchema" frequency="30" initial-instance="${startTime}" timezone="UTC"> <uri-template>${nameNode}/ga/bySchema/</uri-template> <done-flag>ga_${YEAR}${MONTH}${DAY}.avro</done-flag> </dataset> </datasets> <input-events> <data-in name="coordInput1" dataset="gaSchema"> <start-instance>${coord:current(-23)}</start-instance> <end-instance>${coord:current(0)}</end-instance> </data-in> </input-events> <action> <workflow> <app-path>${wfApplicationPath}</app-path> <configuration> <property><name>date</name><value>${coord:formatTime(coord:nominalTime(), "yyyyMMdd")}</value></property> <property><name>jobTracker</name><value>${jobTracker}</value></property> <property><name>nameNode</name><value>${nameNode}</value></property> <property><name>jobName</name><value>${jobName}</value></property> </configuration> </workflow> </action> </coordinator-app>
When the file with current date arrives a hdfs folder then trigger workflow.
<done-flag>ga_${YEAR}${MONTH}${DAY}.avro</done-flag>
It didn't work with dynamic name. I search it on internet, it seems it works on dynamic folder with fixed file name. for example:
<uri-template>${nameNode}/ga/bySchema/${YEAR}${MONTH}${DAY}</uri-template> <done-flag>ga.avro</done-flag>
In this case, I have to create a lot of folders on hdfs because we import data every day.
Do you have any ideas how to trigger oozie workflow when input data with dynamic name is available?
Thanks
Created ‎10-05-2016 11:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find a workaround solution. When a file is dropped in the folder /ga/bySchema/, a _SUCCESS file is created in that folder, <done-flag>_SUCCESS</done-flag> , then trigger workflow, in workflow.xml, I move the file (ga_${today}.avro) into an archive folder and delete _SUCCESS file. ${today} is defined in coordinator.xml. now working fine.
Created ‎10-05-2016 11:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find a workaround solution. When a file is dropped in the folder /ga/bySchema/, a _SUCCESS file is created in that folder, <done-flag>_SUCCESS</done-flag> , then trigger workflow, in workflow.xml, I move the file (ga_${today}.avro) into an archive folder and delete _SUCCESS file. ${today} is defined in coordinator.xml. now working fine.
