Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Cannection to HBase / Zookeeper hangs.

Cannection to HBase / Zookeeper hangs.

New Contributor
Hello all

I have a MapReduce job connecting to HBase. I have tested it on Cloudera VM 5.0.0 (so psuedo-distributed mode). Now I try to run it on a CDH5 cluster deployed on Amazon EC2.
Following the advices found in Google, I updated the dependency in my JAR to match the one on the cluster: 0.96.1.1-cdh5.0.2 for HBase and 2.3.0-cdh5.0.2 for Hadoop. Also, I added hbase.zookeeper.quorum property to my HBaseConfiguration (it was not necessary in pseudo-distributed mode...):

Configuration mapReduceConfiguration = HBaseConfiguration.create();
mapReduceConfiguration.set("hbase.zookeeper.quorum", "domU-12-31-39-16-60-87.compute-1.internal");
mapReduceConfiguration.set("hbase.zookeeper.property.clientPort", "2181");
 
My job hangs after a successful (?) session establishment to Zookeeper. Stacktrace available here: http://pastebin.com/raw.php?i=GYAAwjfp

My HBase cluster consists of a master and 2 slaves. Zookeeper has one server, the same as HBase master. Some of these values are present in the logs, so a quick summary:
Master public DNS:ec2-54-197-217-239.compute-1.amazonaws.com
Master private DNS: domU-12-31-39-16-60-87.compute-1.internal
Slave01 public DNS:ec2-54-205-8-104.compute-1.amazonaws.com

Slave01 private DNS: ip-10-2-31-239.ec2.internal

Slave02 private DNS: ip-10-72-214-28.ec2.internal
 
Any ideas what might be set wrong?


Best regards,

Tomasz

1 REPLY 1

Re: Cannection to HBase / Zookeeper hangs.

Master Guru
Right after the ZK connection is successfully established, the client would attempt to talk to the RegionServer hosting the META region. I suspect this is what it is getting failures at (and is silently retrying).

If you give it a while (say, 10m), or attempt a jstack on it, do you see any forms of exception or places where it is stuck trying to grab a connection to your RSes? That information may help you proceed with troubleshooting the issue.