Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there a working Python Hive library that connects to a kerberised cluster?

Solved Go to solution
Highlighted

Is there a working Python Hive library that connects to a kerberised cluster?

Rising Star

I have tried using the following Python libraries to connect to a kerberised Hive instance:

PyHive Impyla Pyhs2

None of them seem to be able to connect.

Here is the error message I see when using Impyla:

>>> from impala.dbapi import connect
>>> conn = connect(host='hdpmaster.hadoop',port=10000,database='default',auth_mechanism='GSSAPI',kerberos_service_name='user1')


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/impala/dbapi.py", line 147, in connect
    auth_mechanism=auth_mechanism)
  File "/usr/local/lib/python2.7/dist-packages/impala/hiveserver2.py", line 658, in connect
    transport.open()
  File "/usr/local/lib/python2.7/dist-packages/thrift_sasl/__init__.py", line 72, in open
    message=("Could not start SASL: %s" % self.sasl.getError()))
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server not found in Kerberos database)

Does anyone have a working connection string?

Thanks, Dale

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Is there a working Python Hive library that connects to a kerberised cluster?

Rising Star

This connection string will work as long as the user running the script has a valid kerberos ticket:

import pyhs2

with pyhs2.connect(host='beelinehost@hadoop.com',
                    port=10000,
                    authMechanism="KERBEROS")as conn:

	with conn.cursor()as cur:
		print cur.getDatabases()

Username, password and any other configuration parameters are not passed through the KDC.

View solution in original post

3 REPLIES 3
Highlighted

Re: Is there a working Python Hive library that connects to a kerberised cluster?

Mentor
Highlighted

Re: Is there a working Python Hive library that connects to a kerberised cluster?

Rising Star

This connection string will work as long as the user running the script has a valid kerberos ticket:

import pyhs2

with pyhs2.connect(host='beelinehost@hadoop.com',
                    port=10000,
                    authMechanism="KERBEROS")as conn:

	with conn.cursor()as cur:
		print cur.getDatabases()

Username, password and any other configuration parameters are not passed through the KDC.

View solution in original post

Highlighted

Re: Is there a working Python Hive library that connects to a kerberised cluster?

New Contributor

Sigh. I've looked at literally dozens of forum posts, and every single one has a different connection string. 'this works for me'. 'that works for me'. 'you have to have thrift.py version whatever. ' 'this set of modules with these specific versions work for me.'

 

It's hopeless. there is simply NO generalized version of an ODBC or OLEDB connection string that will work to connect to a kerberized hive 2 server. For SQL server it's so eeeasy and siiimple, and it ALWAYS works if you have the right drivers setup:

 

TheConnectionString = "DRIVER={SQL Server};" & _
"SERVER=servername;" & _
"Database=databasename;" & _
"UID=windowsuserid;" & _
"PWD=windowspassword"

 

So apparently for python connecting to hiveserver 2 you have to experiment and experiment and experiment for a long, long time trying dozens of different strings, and then one of them will FINALLY work.

Don't have an account?
Coming from Hortonworks? Activate your account here