Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Python3 connection to Kerberos Hbase thrift HTTPS

avatar
Explorer

Hi Community member,

We have Python3 application to connect to Hbase and fetch data.

The connectivity was working fine with Kerberos Hbase Thrift Binary protocol (in TSocket) until the Hadoop team moved the Hadoop system to Cloudera and Cloudera manager which start Kerberos Hbase Thrift in HTTPS mode. 

Now the protocol changed from TSocket to HTTP/HTPS and Python code cannot authenticate using HTTP Client with SASL kerberos.

Current Python version used ins Python 3.6.8

and package versions are 

thrift=0.13.0

hbase-thrift=0.20.4

pure_sasl=0.5.1

 

Working code in TSocket mode:

############

from thrift.transport import TSocket,TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
import jprops
from subprocess import call, check_output

#read cluster.properties
with open('/data/properties/cluster.properties') as fp:
properties = jprops.load_properties(fp)


# kerberos ticket
kerberos_ticket():
principal = properties["principal"]
kinitCommand = "kinit" + " " + "-kt"+ " " + keyTab + " " + principal
call(kinitCommand, shell="True")
return

# Hbase connection
def hbase_connection():
#get hbase data
thriftHost = properties["thriftHost"]
hbaseService = properties["hbaseService"]
Tsock = TSocket.TSocket(thriftHost, 9090)
Tsock.setTimeout(2000000) #Milliseconds timeout
transport = TTransport.TSaslClientTransport(
Tsock,
host=thriftHost,
service=hbaseService,
mechanism='GSSAPI'
)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
return client,transport

#get kerberized ticket
kerberos_ticket()

client,transport = hbase_connection()
transport.open()

print(client.getTableNames())

###########

 

I found that in the TTransport.py code there was a comment it just supports TSocket

https://github.com/apache/thrift/blob/master/lib/py/src/transport/TTransport.py 

TTransport.TSaslClientTransport

"transport: an underlying transport to use, typically just a TSocket"

 

We tried to use 

https://github.com/apache/thrift/blob/master/lib/py/src/transport/THttpClient.py

THttpClient.THttpClient(url)
but it cannot be used in TTransport.TSaslClientTransport for SASL kerberos.

 

Please help to suggest if Python cannot be used in CLoudera managed Kerberos Hbase thrift HTTPS and any alternative method to connect Hbase (Kerberos) using Python.

 

Thanks,

Manjil

2 REPLIES 2

avatar
Explorer

I wanted to update the solution, it may be helpful if any wants to use it.


### Client side python packages

  • six 1.15.0
  • thrift 0.13.0
  • hbase-thrift 0.20.4
  • pykerberos 1.2.1

 

### Python code
# Prerequsite kinit and kerberos ticket is available for the user
# Hbase thrift running in http protocol secure mode
# Python code to use local kerberos ticket local cache 
# add kerberos context in http header 
# perform hbase client operation like get table , table scan etc
# 
# Important: the httpClient transport opened session will be available only for one time call,
#            for next hbs operation need get new kerberos context (krb_context) by adding header and open session
##


import kerberos
from thrift import Thrift
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
import ssl


def kerberos_auth():
    hbaseService="<hbase>/<HOST>@<DOMAIN.COM>"
    #service can hbase ot HTTP based on hbase thrift configuration
    clientPrincipal="<user>@<DOMAIN.COM>"
    __, krb_context = kerberos.authGSSClientInit(hbaseService, principal=clientPrincipal)
    kerberos.authGSSClientStep(krb_context, "")
    negotiate_details = kerberos.authGSSClientResponse(krb_context)
    headers = {'Authorization': 'Negotiate ' + negotiate_details,'Content-Type':'application/binary'}
    return headers

httpClient =  THttpClient.THttpClient('https://<THRIFT_HOST>:9090/', cert_file='<client cert file path>.crt',key_file='<client cert key file path>.key', ssl_context=ssl._create_unverified_context())
# if no ssl verification is required 
# for new session start
httpClient.setCustomHeaders(headers=kerberos_auth())
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
httpClient.open()
client = Client(protocol)
# for new session end
client.getTableNames()

 

Thanks you,

Manjil

avatar
Master Collaborator

@manjilhk Thanks for sharing this awesome solution using THttpClient transport, can you let us know which CDH version are you at?

In CDH6.x the TSaslClientTransport is working, but in CDP starter version there's some code changed to cause this transport failed to communicate with secured cluster.

We have released hotfix to this issue, if below KB matches your issue please raise a Cloudera case to apply for this hotfix, or you need to wait for the future release 7.2.11 which will include this fix.

Please see this KB that I posted:

https://my.cloudera.com/knowledge/Cannot-connect-to-HBase-Thrift-from-Python-scripts-after?id=318921

 

- Will Xiao, Support Engineer
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, click on the thumbs up button.