Support Questions

Find answers, ask questions, and share your expertise

Is there a working Python Hive library that connects to a kerberised cluster?

avatar
Expert Contributor

I have tried using the following Python libraries to connect to a kerberised Hive instance:

PyHive Impyla Pyhs2

None of them seem to be able to connect.

Here is the error message I see when using Impyla:

>>> from impala.dbapi import connect
>>> conn = connect(host='hdpmaster.hadoop',port=10000,database='default',auth_mechanism='GSSAPI',kerberos_service_name='user1')


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/impala/dbapi.py", line 147, in connect
    auth_mechanism=auth_mechanism)
  File "/usr/local/lib/python2.7/dist-packages/impala/hiveserver2.py", line 658, in connect
    transport.open()
  File "/usr/local/lib/python2.7/dist-packages/thrift_sasl/__init__.py", line 72, in open
    message=("Could not start SASL: %s" % self.sasl.getError()))
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server not found in Kerberos database)

Does anyone have a working connection string?

Thanks, Dale

1 ACCEPTED SOLUTION

avatar
Expert Contributor

This connection string will work as long as the user running the script has a valid kerberos ticket:

import pyhs2

with pyhs2.connect(host='beelinehost@hadoop.com',
                    port=10000,
                    authMechanism="KERBEROS")as conn:

	with conn.cursor()as cur:
		print cur.getDatabases()

Username, password and any other configuration parameters are not passed through the KDC.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

avatar
Expert Contributor

This connection string will work as long as the user running the script has a valid kerberos ticket:

import pyhs2

with pyhs2.connect(host='beelinehost@hadoop.com',
                    port=10000,
                    authMechanism="KERBEROS")as conn:

	with conn.cursor()as cur:
		print cur.getDatabases()

Username, password and any other configuration parameters are not passed through the KDC.

avatar
Explorer

Sigh. I've looked at literally dozens of forum posts, and every single one has a different connection string. 'this works for me'. 'that works for me'. 'you have to have thrift.py version whatever. ' 'this set of modules with these specific versions work for me.'

 

It's hopeless. there is simply NO generalized version of an ODBC or OLEDB connection string that will work to connect to a kerberized hive 2 server. For SQL server it's so eeeasy and siiimple, and it ALWAYS works if you have the right drivers setup:

 

TheConnectionString = "DRIVER={SQL Server};" & _
"SERVER=servername;" & _
"Database=databasename;" & _
"UID=windowsuserid;" & _
"PWD=windowspassword"

 

So apparently for python connecting to hiveserver 2 you have to experiment and experiment and experiment for a long, long time trying dozens of different strings, and then one of them will FINALLY work.

avatar
New Contributor

Yes it's a big SIGH!!! I've tried 10s and 20s of different connection strings from trying to install older verison of Python (3.7.4) so I can install sasl and pyhive and basically everything I could find out there but it's still not working yet. 

 

So, basically my setup is HIVE on Azure and the DB connections have server/host something like this "<server>.azurehdinsight.net" with port of 443. I'm using DBeaver to connect to the HIVE db and it's using JDBC URL - complete URL is something like this "jdbc:hive2://<server>.azurehdinsight.net:443/default;transportMode=http;ssl=true;httpPath=/hive2", so can someone please help me out with what packages I need in order for me to successfully query HIVE from Python? 

 

@pudnik26354 - can you please post what worked for you? Thank you so much.