Created 08-16-2016 10:24 AM
we are running an oozie hive2 action which discovers hiveserver2 via zookeeper. Below is the snippet of hive2
<hive2 xmlns="uri:oozie:hive2-action:0.1"> <prepare> <delete path="${WF_OUTPUT_PATH}-${wf:id()}/_query1"/> </prepare> <jdbc-url>jdbc:hive2://zookeeper:2181/table;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</jdbc-url> <password>dummy</password> <script>queries/_query1.sql</script> <param>outputDir=${WF_OUTPUT_PATH}-${wf:id()}/_query1</param> <argument>--hiveconf</argument> <argument>tez.queue.name=${HIVE_QUEUE}</argument> <argument>--hiveconf</argument> <argument>hive.query.name=tpch_query1</argument> <argument>-i</argument> <argument>testbench.settings</argument> <file>testbench.settings</file> </hive2>
The MR job which runs the hive query via beeline is failing because of connection timeout to zookeeper.
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474) at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:214) at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:203) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:199) at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:191) at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38) at org.apache.hive.jdbc.ZooKeeperHiveClientHelper.configureConnParams(ZooKeeperHiveClientHelper.java:63) at org.apache.hive.jdbc.Utils.configureConnParams(Utils.java:509) at org.apache.hive.jdbc.Utils.parseURL(Utils.java:429) at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:134) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:146) at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:211) at org.apache.hive.beeline.Commands.close(Commands.java:1002) at org.apache.hive.beeline.Commands.closeall(Commands.java:984) at org.apache.hive.beeline.BeeLine.close(BeeLine.java:845) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:792) at org.apache.oozie.action.hadoop.Hive2Main.runBeeline(Hive2Main.java:266) at org.apache.oozie.action.hadoop.Hive2Main.run(Hive2Main.java:240) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:51) at org.apache.oozie.action.hadoop.Hive2Main.main(Hive2Main.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:242) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) 2016-08-16 08:04:21,196 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=zookeeper:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@61d01788 2016-08-16 08:04:21,196 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down 2016-08-16 08:04:21,199 INFO [main-SendThread(XXXX:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server XXXXX:2181. Will not attempt to authenticate using SASL (unknown error) 2016-08-16 08:04:21,199 INFO [main-SendThread(XXXX:2181)] org.apache.zookeeper.ClientCnxn: Socket connection established to XXXXX:2181, initiating session 2016-08-16 08:04:21,199 INFO [main-SendThread(XXXX:2181)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2016-08-16 08:04:21,745 INFO [main] org.apache.zookeeper.ZooKeeper: Session: 0x0 closed 2016-08-16 08:04:21,745 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
Is there any way to increase the timeout ? what would be the configuration for this.
Created 08-17-2016 06:54 AM
while I could not find configuration to control the timeout, we trouble shooted the issue on why zookeeper was taking more 60secs and it turns that zookeeper was rate limiting the connection. Here is a good article which explains on the concepts.
We ended up figuring out the rogue app which was causing the connection leak to ZK.
Created 08-17-2016 06:54 AM
while I could not find configuration to control the timeout, we trouble shooted the issue on why zookeeper was taking more 60secs and it turns that zookeeper was rate limiting the connection. Here is a good article which explains on the concepts.
We ended up figuring out the rogue app which was causing the connection leak to ZK.
Created 08-17-2016 01:38 PM
Increasing the 'tickTime' value of zk helps to reduce ConnectionLoss due to delay/missing of heartbeats, basically it increases the session timeout.
the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.