Created 11-20-2017 04:05 PM
I'm trying to get a table located in hive (hortonworks) ,to collect some twitter data to implement on a machine learning project, using pyhive since pyhs2 is not supported by python3.6.
Here's my code:
from pyhive import hive conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL') import pandas as pd import sys df = pd.read_sql("SELECT * FROM my_table", conn) print(sys.getsizeof(df)) df.head()
When compiling I get this error:
Traceback (most recent call last): File "C:\Users\PWST112\Desktop\import.py", line 44, in <module> conn = hive.Connection(host='192.168.1.11', port=10000, auth='NOSASL') File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\pyhive\hive.py", line 164, in __init__ response = self._client.OpenSession(open_session_req) File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site- packages\TCLIService\TCLIService.py", line 187, in OpenSession return self.recv_OpenSession() File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession (fname, mtype, rseqid) = iprot.readMessageBegin() File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 148, in readMessageBegin name = self.trans.readAll(sz) File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll chunk = self.read(sz - have) File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TTransport.py", line 161, in read self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size))) File "C:\Users\PWST112\AppData\Local\Programs\Python\Python36\lib\site-packages\thrift\transport\TSocket.py", line 132, in read message='TSocket read 0 bytes') thrift.transport.TTransport.TTransportException: TSocket read 0 bytes [Finished in 0.3s]
Here is the PIP list:
beautifulsoup4 (4.6.0) bleach (2.0.0) colorama (0.3.9) cycler (0.10.0) decorator (4.0.11) entrypoints (0.2.3) ez-setup (0.9) future (0.16.0) html5lib (0.999999999) impala (0.2) ipykernel (4.6.1) ipython (6.1.0) ipython-genutils (0.2.0) ipywidgets (6.0.0) jedi (0.10.2) Jinja2 (2.9.6) jsonschema (2.6.0) jupyter (1.0.0) jupyter-client (5.1.0) jupyter-console (5.1.0) jupyter-core (4.3.0) konlpy (0.4.4) MarkupSafe (1.0) matplotlib (2.0.2) mistune (0.7.4) nbconvert (5.2.1) nbformat (4.3.0) nltk (3.2.4) notebook (5.0.0) numpy (1.13.1+mkl) pandas (0.20.3) pandocfilters (1.4.1) pickleshare (0.7.4) pip (9.0.1) prompt-toolkit (1.0.14) pure-sasl (0.4.0) Pygments (2.2.0) PyHive (0.5.0) pyhs2 (0.6.0) pyparsing (2.2.0) python-dateutil (2.6.0) pytz (2017.2) pyzmq (16.0.2) qtconsole (4.3.0) sasl (0.2.1) scikit-learn (0.18.2) scipy (0.19.1) setuptools (28.8.0) simplegeneric (0.8.1) six (1.10.0) testpath (0.3.1) thrift (0.10.0) thrift-sasl (0.3.0) tornado (4.5.1) traitlets (4.3.2) wcwidth (0.1.7) webencodings (0.5.1) wheel (0.30.0) widgetsnbextension (2.0.0)
Can somebody help?
I have my sandbox configured for "NONE" authentication, since the NOSASL option is not available.
Best regards
Created 01-22-2018 10:19 AM
Created 01-22-2018 10:19 AM
Used impyla.
Works like a charm 🙂
Created 06-01-2018 02:25 PM
I had a similar problem with pyhive on my horton setup. The failure was always immediate and so it was not a timeout issue that some people on the net were pointing out.
It turned out to be hive.server2.transport.mode. If this is set to binary, it works like a Charm. If it is 'http', PyHive does not work.
Also found https://github.com/dropbox/PyHive/issues/69 which talks about this. HTH.