Support Questions

Find answers, ask questions, and share your expertise

Python connect to Hadoop using Hive with Kerberos authentication

avatar
conn_config ={'krb_host':'hostname','krb_service':'hive'}
pyhs2.connect(host='hostname',
                   port=10000,
                   authMechanism="KERBEROS",
                   password="********",
                  user='hostname@XXXXXXXXXXXXXXXXXXXX')
Error Encountered:
 user='XXXXXXX@XXXXXXXXXXXXXXXXXXXX')
  File "build\bdist.win-amd64\egg\pyhs2\__init__.py", line 7, in connect
  File "build\bdist.win-amd64\egg\pyhs2\connections.py", line 46, in __init__
  File "build\bdist.win-amd64\egg\pyhs2\cloudera\thrift_sasl.py", line 66, in open
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2

checked below string but still have same problem
http://stackoverflow.com/questions/29814207/python-connect-to-hive-use-pyhs2-and-kerberos-authentica...
Can please some one help to solve ths problem. Your help is much appreciated
7 REPLIES 7

avatar

@Artem Ervits can you please help me to solve this issue.

it is much appreciated

avatar

@Dale Bradman can you please help me to solve this issue.

it is much appreciated

avatar
Master Mentor

avatar

@siddharth peesary

1. You must install kerberos client on your PC.

2. You mast get and kinit a ticket for kerberos.

11205-krb5.png

3. And then you must install Pyhive model.

4. Then test.py like this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# hive util with hive server2
from impala.dbapi import connect


class HiveClient:
    def __init__(self, db_host, port, authMechanism, user, password, database, kbservice):
        self.conn = connect(host=db_host,
                            port=port,
                            auth_mechanism=authMechanism,
                            user=user,
                            password=password,
                            database=database,
                            kerberos_service_name=kbservice
                            )


    def query(self, sql):
        with self.conn.cursor() as cursor:
            cursor.execute(sql)
            return cursor.fetchall()


    def close(self):
        self.conn.close()


if __name__ == '__main__':
    hive_client = HiveClient(db_host='namenode02.xxx.com', port=10000, authMechanism='GSSAPI', user='hive', password='',
                             database='data_mp_raw', kbservice='hive')
    sql = "select dt, hour, vid, sum(1) as play_num from tbl_mp_stream where dt='20161204' and hour='12' and vid is not null group by dt, hour, vid order by play_num desc limit 50"
    sql = "SHOW TABLES"
    result = hive_client.query(sql)
    hive_client.close()
    for x in result:
        print(x)

5. The result like this( My python version is 3.5.2 )

11206-test-hive.png

avatar
hive_client = HiveClient(db_host='XXXXXXXXX.scglobaluat.aduat.scotiacapital.com', port=10000, authMechanism='GSSAPI', user='XXXXXXX', password='XXXXXX',database='tsz', kbservice='hive')

The below is error code:

Please let me know where is the issue

Traceback (most recent call last):

File "C:/Users/speesary/connct_server", line 33, in <module> database='tsz', kbservice='hive') File "C:/Users/speesary/connct_server", line 17, in __init__ kerberos_service_name=kbservice

File "C:\Users\speesary\AppData\Roaming\Python\Python27\site-packages\impala\dbapi.py", line 147, in connect auth_mechanism=auth_mechanism)

File "C:\Users\speesary\AppData\Roaming\Python\Python27\site-packages\impala\hiveserver2.py", line 658, in connect transport.open()

File "C:\Users\speesary\AppData\Roaming\Python\Python27\site-packages\thrift_sasl\__init__.py", line 72, in open message=("Could not start SASL: %s" % self.sasl.getError())) thrift.transport.TTransport.TTransportException:

Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2 Process finished with exit code 1

avatar

pyhs2 can't support Python 3.

avatar
Explorer

The DSN-less connection string below FINALLY worked for me, in windows 10. I created a file DSN, then copy/pasted the string into the python code, as a template.

 

Three lessons that I learned from this struggle:

 

1) kerberos is CASE SENSITIVE. Your kerberos realm in the string MUST be uppercase.

2) The Cloudera driver doesn't like spaces in between the semicolons in the string. Avoid them.

3) If you don't need connection pooling, turn it off with a pyodbc.pooling = False statement.

import pyodbc

strFileDSNAsAstring = "DRIVER=Cloudera ODBC Driver for Apache Hive;USEUNICODESQLCHARACTERTYPES=1; \
SSL=0;SERVICEPRINCIPALCANONICALIZATION=0;SERVICEDISCOVERYMODE=0;SCHEMA=database;PORT=port; \
KRBSERVICENAME=hive;KRBREALM=uppercaserealm;KRBHOSTFQDN=hostfqdndomain;INVALIDSESSIONAUTORECOVER=1; \
HOST=host;HIVESERVERTYPE=2;GETTABLESWITHQUERY=0;ENABLETEMPTABLE=0;DESCRIPTION=Hive; \
DELEGATEKRBCREDS=0;AUTHMECH=1;ASYNCEXECPOLLINTERVAL=100;APPLYSSPWITHQUERIES=1;CAIssuedCertNamesMismatch=1;"

try:
pyodbc.pooling = False
conn = pyodbc.connect(strFileDSNAsAstring, autocommit=True)
except:
print("failure.")
else:
conn.close()
print("success.")