I face a random problem when I use beeline passing a ZK as a connection.
beeline -u "jdbc:hive2://myZK1:2181,myZK2:2181,myZK3:2181,myZK4:2181,myZK5:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
What I noticed is, using this string I always connect on myZK1, as follow:
Sometimes every query gets stuck, taking too long to show something.
I changed beeline string connection, removing the first ZK (myZK1) and I was able to query normally (using ZK2).
What I suspect, some overload using the first Zk.
I used the following command and I got almost 400 connection on ZK1 and about 250 on ZK2
# echo cons | nc localhost 2181 | wc -l
Does zookeeper has an internal load balancer? Why when I connect using that string connection do I always connect on the first ZK?
If my suspicion about overload is right, what can I do? A Load Balancer such as HA Proxy can solve this?
Thanks in advance!
Hey bro @Eric Leme!
Could you please, enable the DEBUG for beeline?
Another thing that may us help here, is to print the thread dump from HS2 when occurs these delays.
jstack -l <PID>
About the zookeeper, try to execute the following command:
echo wchs | nc 127.0.0.1 2181
The command above will show the number of watches against x paths.
To pick more details (and depending on the number shown, if not too high), then you can execute the following command to make sure about your hypothesis. Coz ZK's might be handling other components like kafka, hdfs-ha and so on..
echo wchp | nc 127.0.0.1 2181
About the last question, AFAIK zookeeper will redirect you to the HA-HS2 active, unless you've transactions enabled for Hive, then HS2 will rely on ZK to take care of locks (tables, databases, etc..).
PS: if you aren't allowed to print some details, then just paste the ERRORS/WARN/FATAL/LOCKS 🙂
Hope this helps!
Hello dude @Vinicius Higa Murakami
I tried to enable debug mode but the problem is after I run some beeline command, so it is being stuck without log.
jstack -l hiveserverPID results have too many information but the final result that I got was:
JNI global references: 399
echo wchs | nc 127.0.0.1 2181 335 connections watching 2300 paths Total watches:22724
There few Errors on HS2 as below:
2018-07-23 07:01:19,321 ERROR [pool-7-thread-188]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message. 2018-07-23 07:01:24,324 ERROR [pool-7-thread-188]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message. 2018-07-23 07:01:29,327 ERROR [pool-7-thread-188]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) - Error occurred during processing of message. 2018-07-23 08:54:09,306 WARN [pool-7-thread-198]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.file does not exist 2018-07-23 08:54:09,411 WARN [pool-7-thread-198]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.dir does not exist 2018-07-23 08:54:09,411 WARN [pool-7-thread-198]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.file does not exist 2018-07-23 11:00:04,463 ERROR [pool-7-thread-192]: security.JniBasedUnixGroupsMapping (JniBasedUnixGroupsMapping.java:logError(73)) - error looking up the name of group 1000000000: No such file or directory
Example. I run "show databases;"
132 rows selected (56.052 seconds) 0: jdbc:hive2://zk3>
I run again using the same server/parameter
132 rows selected (0.232 seconds) 0: jdbc:hive2://zk3>
56 seconds was ok, sometimes we wait for hours.
Hey man @Eric Leme!
Sorry for the long delay, I've been ill the last few days 🙂
So regarding your issue, any progress on that?
I saw a plenty of watches on ZK, seems to be a signal to pay attention.
And as you mentioned, this random problem only appears when you face some jam traffic of writes/reads? If so, I'd say that you might have some zk performance issues as you suspected.
To track down the problem deeply, I suggest to take some steps from here:
- Watch it, to see if only happens to Hive, or if the other services depending on zk are having the same issue.
- Enable DEBUG for zk to see if shows up something
- Monitor the ZK hosts resources to see peaks
- Take some thread dumps with jstack and look if you find anything on GC logs as well
Also, take a look at this:
One last thing.. @Eric Leme
Not sure if it's gonna help you, but...I made a research here.
Knowing that the algorithm of zookeeper is called paxos.. there are some explanations of the read/write causes.
Straight-forward implementations of replicated state machines built on top of multi-paxos suffer from two major performance limitations. One limitation affects writes and stems from the fact that Paxos requires a minimum of two round trips over the network (ignoring client/server communication) in order to achieve resolution. As a consequence, the addition of each link to the multi-paxos chain also requires two round trips over the network. This makes writes fairly expensive operations with relatively high-latency and low throughput.
The other limitation affects reads and stems from the fact that any peer can add links to the multi-paxos chain at any time. Because peers can fall behind one another, all read operations must first use a full Paxos instance to gain consensus on what the most recent state is before they can return a result that is guaranteed to be up-to-date. This makes reads equally as expensive as writes.
So in my humble hypothesis, to attend to the client's needs, the watcher takes a full scan on the ZNODE watched path to see the list of states (if they have mutated or not). What can cause some delay by reading changed states (if needed..) meanwhile the write works happen.
On the other side, a lot of writes seems to be expensive for a small zookeeper ensemble.
To make sure of it (and if you're allowed to do it..), there's an interesting project called zk-smoketest to test ZKServer performance.
Hope this helps! 🙂