Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

understanding oozie active-active architecture

avatar
Rising Star

Hi,

I have some jobs that must run every night, these jobs are scheduled in oozie. By the moment I make oozie ha, the oozie servers will share these scheduling,

My quetion is, Will these jobs scheduled be executed twice ? ( I suppose this is a no, but why?)

And should I change the value of the variable oozie_base_url for these jobs to localhost or to my load balancer adress ?

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @jean rivera, yes, oozie servers "share" the scheduling by connecting to the same Oozie database. Scheduled jobs are executed only once, the server to run a job is selected randomly, and coordination is done using Zookeeper distributed locks. Regarding oozie_base_url, set it to the URL of your load balancer. More details about Oozie active-active architecture here. And finally, in order to use Oozie HA you must replace the default derby database shipped with Oozie with a "real" one, which supports concurrent queries, like MySql or postgres.

View solution in original post

6 REPLIES 6

avatar
Master Guru

Hi @jean rivera, yes, oozie servers "share" the scheduling by connecting to the same Oozie database. Scheduled jobs are executed only once, the server to run a job is selected randomly, and coordination is done using Zookeeper distributed locks. Regarding oozie_base_url, set it to the URL of your load balancer. More details about Oozie active-active architecture here. And finally, in order to use Oozie HA you must replace the default derby database shipped with Oozie with a "real" one, which supports concurrent queries, like MySql or postgres.

avatar
Rising Star

Hi @Predrag Minovic , thank you for the answer. Can you please tell me if there is something wrong not changing the oozie_base_url for the scheduled jobs, for example leaving it for localhost:11000/oozie ?

I think that for the case of scheduled jobs, in terms of HA, if I use or not the load balancer as oozie_base_url nothing will change. This is because having two oozie servers sharing the same info, one of them will execute the job. However the load balancer starts to get relevance when we have a jobs on demand.

I am on the right track?

Thx @Kuldeep Kulkarni for the very relevant info. I am planing to use kerberos

avatar
Master Guru

First of all, you should not use "localhost" anywhere in your settings, but FQDNs. Let's call your Oozie HA servers oz1 and oz2, and your LB ozlb (these are all FQDNs). If you keep addressing Oozie by let's say oz1, then if oz1 goes down you lose Oozie functionality. That's the main reason to do HA, and the main reason to set OOZIE_BASE_URL=ozlb, because the LB can detect a failed Oozie server, and redirect the traffic to the other one. Of course, the LB is also doing load balancing, when both oz1 and oz2 are healthy (jobs on demand). HTH.

avatar
Rising Star

Thx @Predrag Minovic

one last question, the variable OOZIE_BASE_URL set in oozie-site or oozie-env is global for all the jobs scheduled or each job must define its own OOZIE_BASE_URL?

avatar
Master Guru

No, not each job, you must be careful how do you, as I said "address Oozie". If you run Oozie commands from the command line you need to use the ozlb address as your Oozie server, or export OOZIE_URL=ozlb. And if you use Falcon, then when you define a logical cluster you need to use ozlb as your Oozie server address.

avatar
Master Guru

@jean rivera

In addition to answer given by Predrag. If you are interested to setup Oozie HA with Load balancer in a Kerberized environment, you can refer below article.

http://crazyadmins.com/oozie-ha-configuration-with-kerberos/