I am trying to connect to HiveServer2 (HDP2.2) through Knox gateway and looking for a python API and some basic connection settings. Our cluster is Kerberized and SSL enabled for Knox
This describes the configuration needed for ODBC clients (ODBC > HTTP > Knox > HTTP > Hive Server 2), using Thrift's support for JDBC/ODBC access over HTTP with hive.server2.transport.mode set to http.
From Python, you can use a library like pyodbc to create the ODBC Python client. There are also projects like https://github.com/dropbox/PyHive, but I haven't tested connecting with HTTP transport mode.
I tried using pyodbc with Hortonworks ODBC driver (with below odbc.ini settings). I have mentioned the certificate that we use here to authenticate our Knox connection. However I ended up having invalid sasl status error (is this something to do with my authentication?). Am I missing something in the driver settings
>>> con = pyodbc.connect('DSN=hive-knox-qa') Traceback (most recent call last): File "<stdin>", line 1, in <module> pyodbc.Error: ('HY000', '[HY000] [unixODBC][Hortonworks][Hardy] (34) Error from server: invalid sasl status. (34) (SQLDriverConnect)')
[hive-knox-qa] Driver=/usr/lib/hive/lib/native/Linux-amd64-64/libhortonworkshiveodbc64.so HOST=<myKnoxQAServer> PORT=8443 HiveServerType=2 AuthMech=3 UID=<myUsernameHere> PWD=<myPasswordHere> SSL=1 AllowSelfSignedServerCert=1 CAIssuedCertNamesMismatch=1 TrustedCerts=/usr/lib/hive/lib/native/Linux-amd64-64/cacerts.pem HTTPPath=gateway/default/hive ThriftTransport=1
Still same error
pyodbc.Error: ('HY000', '[HY000] [unixODBC][Hortonworks][Hardy] (34) Error from server: invalid sasl status. (34) (SQLDriverConnect)')
If HiveServer2 is kerberized, we'll need to use a different form of the connection string (without passing username and password) with AuthMech=1.
From the ODBC Driver User Guide:
Example connection string: Host=<hs2 host>;Port=<hs2 port>; HiveServerType=2;AuthMech=1;Schema=<Hive database>;KrbRealm=<Kerberos Realm>;KrbHostFQDN=<hs2 fully qualified domain name>;KrbServiceName=<hs2 service name>
If there is no default realm configured for your Kerberos setup, then type the value for the Kerberos realm of the HiveServer2 host. Otherwise leave it blank. The Realm is only needed if your Kerberos setup does not define a default realm or if the realm of your HiveServer2 is not the default.
In the Host FQDN field, type the value for the fully qualified domain name of the HiveServer2 host.
In the Service Name field, type the value for the service name of the Hive Server 2. For example, if the principle for the HiveServer2 is "hive/fully.qualified.domain.name@YOUR-REALM.COM", then the value in the service name field should be hive.
I think direct connection to our Hive services is revoked and we can go through Knox only. I got through the SASL error by using HTTP ThriftTransport mode. Thanks for your support :-)