Support Questions
Find answers, ask questions, and share your expertise

Depended Hive processes

Rising Star


I'd like to know is it possible to get with Falcon.

HiveFeed1 -> Process1 -> HiveFeed2 -> Process2 -> HiveFeed3

Is it possible to tell Process2 do not start before Process1 is ended?

HCat provides notifications if partition is added but I do not is it a solution? If I have long running hive process and I add record to hive like alter table add partition... Does Hcat sends partition added notification and then Process2 thinks that he can start.

Br, Margusja


Not how it works. Falcon(using Oozie coordinators) can wait for partitions to be added. However it does that by scheduling a job starting with the start time of the job and then checking back if the partition/folder for that job exists every 60 seconds ( configurable ). So for example if you have hourly partitions, and the partition gets created sometimes in that hour you would just schedule the job at the beginning of an hour and Oozie will check back if the folder/partition has been created every minute.

Rising Star

Thanks for the answer. I think I do not want to depend with seconds. So exec engine oozie is the safest choice and there I can bind processes as I want. I hope it is the right way do to it.

@Margus Roo

If HiveFeed2 is output of Process1 and input to Process2, process2 will wait until input feed is generated thus establishing the pipeline. Please take a look at defining Falcon end to end pipeline to understand how to define pipeline using Falcon.