We have setup Oozie through CM in our Clustser. Lately, we noticed a workflow that runs longer than usual through Oozie. Ran the same Workflow through shell and it succeeded pretty quickly. Through the Oozie Web UI, I noticed that oozie.zookeeper.connection.string pointing to localhost:2181 instead of the ZooKeeper Quorom.
Through CM, we are pointing it to use the ZooKeeper service so it should pick up the ZooKeeper hosts without having to over-ride in the oozie-site.xml file. Is this a bug?
Oozie only uses ZooKeeper when configured in HA (High Availability). CM only emits oozie.zookeeper.connection.string when HA is enabled, even though the ZooKeeper Service option always showing up in Oozie's Configuration page in CM. And when CM doesn't emit this, Oozie uses the default, which is localhost; but Oozie's not actually using it or talking to ZooKeeper.
To investigate your slower job, you should look through the Oozie Server logs and action output in Yarn to figure out what's actually happening.
2015-10-19 00:15:11,477 INFO [main] org.apache.zookeeper.ZooKeeper: Client environment:host.name=host1
2015-10-19 00:15:11,478 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x57101ba40x0, quorum=localhost:2181, baseZNode=/hbase 2015-10-19 00:15:11,490 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2015-10-19 00:15:11,491 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused
When I run through Oozie, I see a lot of above warnings of it using localhost. When I run it through the Shell, I see that it's using the correct hostnames from the ZooKeeper Quorom instead of localhost.