Created 02-16-2024 09:16 PM
Hello guys,
I currently work on a company that does provide hive 3.1 servers by using knox or zookeeper (kerberos) authentication methods. I am able to connect and query data by using a odbc connection on my personal computer.
Now I need to set the connection on a virtual jupiter notebook server with pyodbc , so, I am not able to install the ODBC (and probably the server is based on Linux anyway).
I´ve seen many options as pyhive, implya and others, but everything I find is regarding Hive 2 servers.
So, what could I use to set the python connection to the Hive 3 servers?
Created 02-18-2024 09:59 AM
@frbelotto I have not tried with pyhive, I think it requires additional modules if you want to connect using zookeeper quorum. But you could use jaydebeapi python module to connect to Hive3. It works for any type of connection string knox/ZK. You would require Hive driver that you could download from here.
An example on how to to make use of jaydebeapi module to connect to Hive:
import jaydebeapi
# Connection parameters
jdbc_url = 'jdbc:hive2://knox.host:8443/default' # JDBC URL for HiveServer2
username = 'your-username'
password = 'your-password'
jar_file = '/path/to/hive-jdbc-driver.jar' # Path to the Hive JDBC driver JAR file
# Establish connection to Hive
conn = jaydebeapi.connect(
'org.apache.hive.jdbc.HiveDriver',
jdbc_url,
[username, password],
jar_file
)
# Create cursor
cursor = conn.cursor()
# Execute Hive query
cursor.execute('SELECT * FROM hive_table')
# Fetch results
result = cursor.fetchall()
# Close cursor and connection
cursor.close()
conn.close()
Created 02-18-2024 09:59 AM
@frbelotto I have not tried with pyhive, I think it requires additional modules if you want to connect using zookeeper quorum. But you could use jaydebeapi python module to connect to Hive3. It works for any type of connection string knox/ZK. You would require Hive driver that you could download from here.
An example on how to to make use of jaydebeapi module to connect to Hive:
import jaydebeapi
# Connection parameters
jdbc_url = 'jdbc:hive2://knox.host:8443/default' # JDBC URL for HiveServer2
username = 'your-username'
password = 'your-password'
jar_file = '/path/to/hive-jdbc-driver.jar' # Path to the Hive JDBC driver JAR file
# Establish connection to Hive
conn = jaydebeapi.connect(
'org.apache.hive.jdbc.HiveDriver',
jdbc_url,
[username, password],
jar_file
)
# Create cursor
cursor = conn.cursor()
# Execute Hive query
cursor.execute('SELECT * FROM hive_table')
# Fetch results
result = cursor.fetchall()
# Close cursor and connection
cursor.close()
conn.close()
Created 02-23-2024 04:15 AM
Sorry, I couldn't get it working.
I am using a company owned Jupiter notebook.
The first issue is that it's not finding the org.apache.hive.jdbc.HiveDriver.
I even tried to upload an driver version to my Jupiter space and set it as path but still does not find.
My second question, that I couldn't even test, is how to set it to use kerberos as authentication.
Created 02-24-2024 05:31 PM
My bad. If you are using the Cloudera JDBC jar, the driver class should be com.cloudera.hive.jdbc.HS2Driver.
As we are talking about Kerberos authentication, you should get a kerberos ticket in the client machine first, and use jdbc_url as follows:
jar_file = '/path/to/hive-jdbc.jar'
jdbc_url = 'jdbc:hive2://{server}:{port}/default;principal={principal}'
# Connect to Hive
conn = jaydebeapi.connect('com.cloudera.hive.jdbc.HS2Driver', jdbc_url, ['', ''], jar_file)
cursor = conn.cursor()
Created 02-25-2024 05:56 PM
I will try. An extra question. I don't have admin privileges on the machine, só. Could I just upload the jar driver file to my ipython instance or I need to "install" it on someway ?
Created 02-26-2024 05:21 AM
@frbelotto It does not require any installation. You just need to share the path to the Driver jar file.
Created 02-22-2024 10:54 PM
@frbelotto, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
Regards,
Vidya Sargur,Created 03-18-2024 03:31 AM
Hi @smruti
Thanks for your contirubution.
I followed your instructions for configuring a Hive connection in Python via JupyterHub. Unfortunately, I encountered the following error:
ERROR - The kernel appears to have died. It will restart automatically.
Have you encountered this error before?
Thanks,
Farman
Created 03-20-2024 02:59 AM
@ZainK This can happen due to various reasons such as resource constraints, or errors in the code. Do check those aspects.
Created 03-22-2024 01:20 AM
@frbelotto , a sample code to test connectivity from Python to Hive.
from pyhive import hive import logging import os logging.basicConfig(level=logging.DEBUG) conn = hive.Connection(host="hs_host",port=10000,database='default') cursor = conn.cursor() cursor.execute("show databases") res = cursor.fetchall() print (res) #connect = hive.Connection(host=db_host, port=db_port, username=db_user, password=db_password, database=db_name,auth='KERBEROS',kerberos_service_name='hive')
make sure to install the below dependencies before running the code:
pip3.8 install pyhive pip3.8 install thrift pip3.8 install thrift_sasl
I hope this helps! Cheers!