Created 05-28-2017 08:10 AM
Hi, I've been successfully using HBase with java client for months, but now I would like to implement a python application to reach my data. I've read several posts about thrift but cannot figure out something : I installed a standalone HBase instance on my linux VM, and it seems that I can execute thrift server directly with "hbase thrift start" command. Did I understand correctly ? Does HBase provide an embedded thrift server ? All the posts I've read suggest to download an instance of thrift server and to compile it in order to generate language (python) specific bindings. Can't I use the "embedded" thrift from hbase for that ? Does it mean that the downloaded & compiled thrift server will "only" be used to generate a few python packages that will be used by my application (but will not be launched itself) ? After having processed with all steps from https://acadgild.com/blog/connecting-hbase-with-python-application-using-thrift-server/, when I try to execute table.py, I'm facing following error :
Traceback (most recent call last): File "table.py", line 1, in <module> from thrift.transport.TSocket import TSocket ModuleNotFoundError: No module named 'thrift'
I don't understand what I missed here...Should I install a thrift package for python somehow (in addition to my generated bindings) ? Thanks for your help, Regards, Sebastien
Created 05-29-2017 05:56 AM
Hi,
The answer depends on which distribution and versions you use or if you are using vanilla HBase. When you, e.g., install HDP 2.4, here is a guide to start the thrift server:
the error message indicates, that you don't have the thrift module installed, that you will need on the client side to execute your python program.
Depending on how you manage packages, e.g., using pip you would need to install the thrift module:
pip install thrift
Doing so, at least this error message will disappear.
Created 05-29-2017 05:56 AM
Hi,
The answer depends on which distribution and versions you use or if you are using vanilla HBase. When you, e.g., install HDP 2.4, here is a guide to start the thrift server:
the error message indicates, that you don't have the thrift module installed, that you will need on the client side to execute your python program.
Depending on how you manage packages, e.g., using pip you would need to install the thrift module:
pip install thrift
Doing so, at least this error message will disappear.
Created 05-30-2017 07:47 PM
Thanks for your help, I got rid with python error.
Anyway, I still cannot connect from my python client to hbase thrift server (about your question, I'm trying to use a vanilla HBase, without any other component), because I can see following error on server side :
2017-05-30 08:01:11,816 WARN [thrift-worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connexion refusée at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-05-30 08:01:11,916 ERROR [thrift-worker-0] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts 2017-05-30 08:01:11,917 WARN [thrift-worker-0] zookeeper.ZKUtil: hconnection-0x3330972d0x0, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:905)
It seems that HBase is trying to reach zookeeper, but I didn't setup any zookeeper...Is it possible to run HBase without zookeeper ?
Thanks again
Regards
Created 05-30-2017 07:47 PM
In fact, was my fault : when my box went back from stand-bye mode, I had not restarted hbase ... It works now 🙂
Created 05-30-2017 07:52 PM
Cool, glad to see that you got it up and running yourself! If my answer was helpful you can vote it up or mark it as best answer. 🙂
Created 05-30-2017 08:25 PM
To answer your question regarding Zookeeper. HBase needs Zookeeper. If you didn't set up Zookeeper yourself, HBase spins up an "internal" Zookeeper server, which is great for testing, but shouldn't be used in production scenarios.