Member since
07-19-2018
8
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5997 | 08-21-2018 10:08 AM |
06-07-2021
12:55 AM
I wanted to update the solution, it may be helpful if any wants to use it. ### Client side python packages six 1.15.0 thrift 0.13.0 hbase-thrift 0.20.4 pykerberos 1.2.1 ### Python code
# Prerequsite kinit and kerberos ticket is available for the user
# Hbase thrift running in http protocol secure mode
# Python code to use local kerberos ticket local cache
# add kerberos context in http header
# perform hbase client operation like get table , table scan etc
#
# Important: the httpClient transport opened session will be available only for one time call,
# for next hbs operation need get new kerberos context (krb_context) by adding header and open session
##
import kerberos
from thrift import Thrift
from thrift.transport import THttpClient
from thrift.protocol import TBinaryProtocol
from hbase.Hbase import Client
import ssl
def kerberos_auth():
hbaseService="<hbase>/<HOST>@<DOMAIN.COM>"
#service can hbase ot HTTP based on hbase thrift configuration
clientPrincipal="<user>@<DOMAIN.COM>"
__, krb_context = kerberos.authGSSClientInit(hbaseService, principal=clientPrincipal)
kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {'Authorization': 'Negotiate ' + negotiate_details,'Content-Type':'application/binary'}
return headers
httpClient = THttpClient.THttpClient('https://<THRIFT_HOST>:9090/', cert_file='<client cert file path>.crt',key_file='<client cert key file path>.key', ssl_context=ssl._create_unverified_context())
# if no ssl verification is required
# for new session start
httpClient.setCustomHeaders(headers=kerberos_auth())
protocol = TBinaryProtocol.TBinaryProtocol(httpClient)
httpClient.open()
client = Client(protocol)
# for new session end
client.getTableNames() Thanks you, Manjil
... View more
01-27-2021
09:52 PM
Hi Community member, We have Python3 application to connect to Hbase and fetch data. The connectivity was working fine with Kerberos Hbase Thrift Binary protocol (in TSocket) until the Hadoop team moved the Hadoop system to Cloudera and Cloudera manager which start Kerberos Hbase Thrift in HTTPS mode. Now the protocol changed from TSocket to HTTP/HTPS and Python code cannot authenticate using HTTP Client with SASL kerberos. Current Python version used ins Python 3.6.8 and package versions are thrift=0.13.0 hbase-thrift=0.20.4 pure_sasl=0.5.1 Working code in TSocket mode: ############ from thrift.transport import TSocket,TTransport from thrift.protocol import TBinaryProtocol from hbase import Hbase from hbase.ttypes import * import jprops from subprocess import call, check_output #read cluster.properties with open('/data/properties/cluster.properties') as fp: properties = jprops.load_properties(fp) # kerberos ticket kerberos_ticket(): principal = properties["principal"] kinitCommand = "kinit" + " " + "-kt"+ " " + keyTab + " " + principal call(kinitCommand, shell="True") return # Hbase connection def hbase_connection(): #get hbase data thriftHost = properties["thriftHost"] hbaseService = properties["hbaseService"] Tsock = TSocket.TSocket(thriftHost, 9090) Tsock.setTimeout(2000000) #Milliseconds timeout transport = TTransport.TSaslClientTransport( Tsock, host=thriftHost, service=hbaseService, mechanism='GSSAPI' ) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Hbase.Client(protocol) return client,transport #get kerberized ticket kerberos_ticket() client,transport = hbase_connection() transport.open() print(client.getTableNames()) ########### I found that in the TTransport.py code there was a comment it just supports TSocket https://github.com/apache/thrift/blob/master/lib/py/src/transport/TTransport.py TTransport.TSaslClientTransport "transport: an underlying transport to use, typically just a TSocket" We tried to use https://github.com/apache/thrift/blob/master/lib/py/src/transport/THttpClient.py THttpClient.THttpClient(url) but it cannot be used in TTransport.TSaslClientTransport for SASL kerberos. Please help to suggest if Python cannot be used in CLoudera managed Kerberos Hbase thrift HTTPS and any alternative method to connect Hbase (Kerberos) using Python. Thanks, Manjil
... View more
Labels:
- Labels:
-
Apache HBase
-
Kerberos
08-21-2018
10:08 AM
Update: It is found that .hiverc was used in hive user for HIVE CLI so the difference was found. hive.exec.scratchdir=/user/hive/scratch hive.exec.stagingdir=/user/hive/staging The issue is hdfs /user/hive directory is encrypted with Ranger and hdfs /tmp/hive directory is non-encrypted and can read/write by all user in hadoop group. hive-site.xml <property> <name>hive.security.authorization.sqlstd.confwhitelist.append</name> <value>hive\.exec\.scratchdir|hive\.exec\.stagingdir</value> <description>append conf property in white list followed by pipeline</description> </property> Restart the metastore and hiveserver. I tested with beeline with session level change . The execution is fast like HIVE CLI . hive.exec.scratchdir=/user/hive/scratch hive.exec.stagingdir=/user/hive/staging I tested with HIVE CLI with session level change. The execution is slow with MAP reduce Job for moving data. hive.exec.scratchdir=/tmp/hive/scratch hive.exec.stagingdir=/tmp/hive/staging So the root cause is data are encrypted in /user/hive and not encrypted in /tmp/hive. Solution is to make ssession level change to use same encryption zone. So below INFO log will be printed if the encryption zones are different. metadata.Hive: Copying source hdfs://edhcluster/tmp/hive/staging_hive_2018-08-07_16- 29-12_750_8973639287951385407-1/-ext-10000/000001_0 to hdfs://edhcluster/user/hive/warehouse/temp_tro/000001_0 because HDFS encryption zones are different. Thanks, Manjil
... View more
08-21-2018
08:37 AM
we can change the parameter in session level by making the entry in hive-site.xml. example: for property hive.exec.scratchdir hive-site.xml ==ADD <property> <name>hive.security.authorization.sqlstd.confwhitelist.append</name> <value>hive\.exec\.scratchdir|</value> <description>append conf property in white list followed by pipeline</description> </property> == Restart metastore and hiveserver beeline> set hive.exec.scratchdir=/user/hive/scratch; beeline> set hive.exec.scratchdir; Hope it helps, if someone is looking for the same. Thanks, Manjil
... View more
08-20-2018
06:34 AM
Dear @Vinicius Higa Murakami, Sorry for late response. I just got the properties differences in both hivecli and beeline from client machine. The differences are the hive.exec.scratchdir and hive.exec.stagingdir. I have uploaded the snapshot. I have tried this method to get the hive-site.xml for hive CLI but no output result for grep. <code>hive --hiveconf hive.root.logger=DEBUG,console -e '' 2>&1 | grep hive-site.xml
Please suggest how to make the config hive-site.xml same for both executions. Thanks and regards, Manjil
... View more
08-10-2018
01:37 AM
Hi @Vinicius Higa Murakami , Thanks for the response. The parameter mentioned is only used for Beeline, Since the job was failing in Mapreduce copy job with error virtual memory used was 18G and allocated was 16.2G in yarn. Just to explain more on the difference observed in log of hive CLI and beeline, the hdfs temp directory used are different. Is there any configuration we need to modify to make same? Hive CLI: hdfs://edhcluster/user/hive/staging_hive_2018-08-07_18-22-53_167_2618699013418541798-1/-ext-10001 Beeline : hdfs://edhcluster/tmp/hive/staging_hive_2018-08-07_16 -29-12_750_8973639287951385407-1/-ext-10001 Hive Cli log: 2018-08-07T18:22:56,601 INFO [main] exec.Utilities: Setting plan: /tmp/hive/scratch/hive/a501276d-2015-435b-85c5-4d40534ac162/hive_2018-08-07_18-22-53_167_2618699013418541798-1/hive/_tez_scratch_dir/d5cc1718-38b1-49ba-a97e-ab9f78415b62/map.xml 2018-08-07T18:22:56,669 INFO [main] fs.FSStatsPublisher: created : hdfs://edhcluster/user/hive/staging_hive_2018-08-07_18-22-53_167_2618699013418541798-1/-ext-10001 2018-08-07T18:22:56,686 INFO [main] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-a501276d-2015-435b-85c5-4d40534ac162, applicationId=application_1533623337748_0376, dagName=insert into default.t...db.temp_large_table3(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, Beeline log: 2018-08-07T16:29:13,903 INFO [HiveServer2-Background-Pool: Thread-1549] exec.Utilities: Setting plan: /tmp/hive/scratch/hive/0887b266-675a-4fb2-8c85-3a27ebb
3b9fc/hive_2018-08-07_16-29-12_750_8973639287951385407-3/hive/_tez_scratch_dir/6f4620d8-310c-4aff-bbe8-6f69ea9d1341/map.xml 2018-08-07T16:29:13,934 INFO [HiveServer2-Background-Pool: Thread-1549] fs.FSStatsPublisher: created : hdfs://edhcluster/tmp/hive/staging_hive_2018-08-07_16
-29-12_750_8973639287951385407-1/-ext-10001 2018-08-07T16:29:13,938 INFO [HiveServer2-Background-Pool: Thread-1549] client.TezClient: Submitting dag to TezSession, sessionName=HIVE-e2dfe4df-37f0-4d95-
946d-30557075f807, applicationId=application_1533623337748_0148, dagName=insert into default.t...db.temp_large_table3(Stage-1), callerContext={ context=HIVE,
callerType=HIVE_QUERY_ID, callerId=hive_20180807162912_519c1503-c151-4da7-b5a2-bd067e9c42b9 } Thanks. Manjil
... View more
08-07-2018
09:32 AM
When executing insert into empty table from large table with millions records( 20GB size). The execution is different in hive CLI and beeline. Hive CLI: It creates two TEZ jobs in Yarn, maybe mapper and reducer and completes in approx 413sec. Beeline: it creates first TEZ job in Yarn and other are MAPREDUCE jobs which is more than 150 jobs and it takes almost 2 hours. is it the expected behavior of hiveserver2 beeline for TEZ job, since internally it creates MAPREDUCE job? Environment details: Hive version: 2.1.1 Tez version: 0.8.5 hive-cli.txtbeeline-jdbc-hs2.txtbeeline-jdbc-hs2.txt hive common settings: hive.execution.engine=tez hive.mv.files.thread=0 beeline setting: tez.am.resource.memory.mb=20000 mapreduce.map.memory.mb=20000 hive.vectorized.execution.reduce.enabled=false; Hive CLI log and Beeline logs uploaded. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez