Created 09-23-2016 03:14 PM
I have tried using the following Python libraries to connect to a kerberised Hive instance:
PyHive Impyla Pyhs2
None of them seem to be able to connect.
Here is the error message I see when using Impyla:
>>> from impala.dbapi import connect >>> conn = connect(host='hdpmaster.hadoop',port=10000,database='default',auth_mechanism='GSSAPI',kerberos_service_name='user1') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/impala/dbapi.py", line 147, in connect auth_mechanism=auth_mechanism) File "/usr/local/lib/python2.7/dist-packages/impala/hiveserver2.py", line 658, in connect transport.open() File "/usr/local/lib/python2.7/dist-packages/thrift_sasl/__init__.py", line 72, in open message=("Could not start SASL: %s" % self.sasl.getError())) thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server not found in Kerberos database)
Does anyone have a working connection string?
Thanks, Dale
Created 09-26-2016 09:00 AM
This connection string will work as long as the user running the script has a valid kerberos ticket:
import pyhs2 with pyhs2.connect(host='beelinehost@hadoop.com', port=10000, authMechanism="KERBEROS")as conn: with conn.cursor()as cur: print cur.getDatabases()
Username, password and any other configuration parameters are not passed through the KDC.
Created 09-24-2016 02:08 PM
Found an answer on stackoverflow, can you try and validate? http://stackoverflow.com/questions/29814207/python-connect-to-hive-use-pyhs2-and-kerberos-authentica...
Created 09-26-2016 09:00 AM
This connection string will work as long as the user running the script has a valid kerberos ticket:
import pyhs2 with pyhs2.connect(host='beelinehost@hadoop.com', port=10000, authMechanism="KERBEROS")as conn: with conn.cursor()as cur: print cur.getDatabases()
Username, password and any other configuration parameters are not passed through the KDC.
Created 03-27-2020 03:38 PM
Sigh. I've looked at literally dozens of forum posts, and every single one has a different connection string. 'this works for me'. 'that works for me'. 'you have to have thrift.py version whatever. ' 'this set of modules with these specific versions work for me.'
It's hopeless. there is simply NO generalized version of an ODBC or OLEDB connection string that will work to connect to a kerberized hive 2 server. For SQL server it's so eeeasy and siiimple, and it ALWAYS works if you have the right drivers setup:
TheConnectionString = "DRIVER={SQL Server};" & _
"SERVER=servername;" & _
"Database=databasename;" & _
"UID=windowsuserid;" & _
"PWD=windowspassword"
So apparently for python connecting to hiveserver 2 you have to experiment and experiment and experiment for a long, long time trying dozens of different strings, and then one of them will FINALLY work.
Created on 11-25-2020 09:14 AM - edited 11-25-2020 09:15 AM
Yes it's a big SIGH!!! I've tried 10s and 20s of different connection strings from trying to install older verison of Python (3.7.4) so I can install sasl and pyhive and basically everything I could find out there but it's still not working yet.
So, basically my setup is HIVE on Azure and the DB connections have server/host something like this "<server>.azurehdinsight.net" with port of 443. I'm using DBeaver to connect to the HIVE db and it's using JDBC URL - complete URL is something like this "jdbc:hive2://<server>.azurehdinsight.net:443/default;transportMode=http;ssl=true;httpPath=/hive2", so can someone please help me out with what packages I need in order for me to successfully query HIVE from Python?
@pudnik26354 - can you please post what worked for you? Thank you so much.