Support Questions

Find answers, ask questions, and share your expertise

Python connection on a hive 3 server

avatar
New Contributor

Hello guys,

I currently work on a company that does provide hive 3.1 servers by using knox or zookeeper (kerberos) authentication methods. I am able to connect and query data by using a odbc connection on my personal computer. 

Now I need to set the connection on a virtual jupiter notebook server with pyodbc , so, I am not able to install the ODBC (and probably the server is based on Linux anyway).

I´ve seen many options as pyhive, implya and others, but everything I find is regarding Hive 2 servers.

So, what could I use to set the python connection to the Hive 3 servers?

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@frbelotto I have not tried with pyhive, I think it requires additional modules if you want to connect using zookeeper quorum. But you could use jaydebeapi python module to connect to Hive3. It works for any type of connection string knox/ZK.  You would require Hive driver that you could download from here.

An example on how to to make use of jaydebeapi module to connect to Hive:

import jaydebeapi

# Connection parameters
jdbc_url = 'jdbc:hive2://knox.host:8443/default'  # JDBC URL for HiveServer2
username = 'your-username'
password = 'your-password'
jar_file = '/path/to/hive-jdbc-driver.jar'  # Path to the Hive JDBC driver JAR file

# Establish connection to Hive
conn = jaydebeapi.connect(
    'org.apache.hive.jdbc.HiveDriver',
    jdbc_url,
    [username, password],
    jar_file
)

# Create cursor
cursor = conn.cursor()

# Execute Hive query
cursor.execute('SELECT * FROM hive_table')

# Fetch results
result = cursor.fetchall()

# Close cursor and connection
cursor.close()
conn.close()

View solution in original post

10 REPLIES 10

avatar
Master Collaborator

@frbelotto I have not tried with pyhive, I think it requires additional modules if you want to connect using zookeeper quorum. But you could use jaydebeapi python module to connect to Hive3. It works for any type of connection string knox/ZK.  You would require Hive driver that you could download from here.

An example on how to to make use of jaydebeapi module to connect to Hive:

import jaydebeapi

# Connection parameters
jdbc_url = 'jdbc:hive2://knox.host:8443/default'  # JDBC URL for HiveServer2
username = 'your-username'
password = 'your-password'
jar_file = '/path/to/hive-jdbc-driver.jar'  # Path to the Hive JDBC driver JAR file

# Establish connection to Hive
conn = jaydebeapi.connect(
    'org.apache.hive.jdbc.HiveDriver',
    jdbc_url,
    [username, password],
    jar_file
)

# Create cursor
cursor = conn.cursor()

# Execute Hive query
cursor.execute('SELECT * FROM hive_table')

# Fetch results
result = cursor.fetchall()

# Close cursor and connection
cursor.close()
conn.close()

avatar
New Contributor

Sorry, I couldn't get it working. 

I am using a company owned Jupiter notebook.

The first issue is that it's not finding the org.apache.hive.jdbc.HiveDriver.

I even tried to upload an driver version to my Jupiter space and set it as path but still does not find. 

My second question, that I couldn't even test, is how to set it to use kerberos as authentication.

avatar
Master Collaborator

My bad. If you are using the Cloudera JDBC jar, the driver class should be com.cloudera.hive.jdbc.HS2Driver.

As we are talking about Kerberos authentication, you should get a kerberos ticket in the client machine first, and use jdbc_url as follows:

jar_file = '/path/to/hive-jdbc.jar'
jdbc_url = 'jdbc:hive2://{server}:{port}/default;principal={principal}'

# Connect to Hive
conn = jaydebeapi.connect('com.cloudera.hive.jdbc.HS2Driver', jdbc_url, ['', ''], jar_file)
cursor = conn.cursor()

avatar
New Contributor

I will try. An extra question. I don't have admin privileges on the machine, só. Could I just upload the jar driver file to my ipython instance or I need to "install" it on someway ? 

avatar
Master Collaborator

@frbelotto It does not require any installation. You just need to share the path to the Driver jar file.

avatar
Community Manager

@frbelotto, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Hi @smruti

Thanks for your contirubution.

I followed your instructions for configuring a Hive connection in Python via JupyterHub. Unfortunately, I encountered the following error:

ERROR - The kernel appears to have died. It will restart automatically.

Have you encountered this error before?

 

Thanks,

Farman

avatar
Master Collaborator

@ZainK This can happen due to various reasons such as resource constraints, or errors in the code. Do check those aspects. 

avatar
Super Collaborator

@frbelotto , a sample code to test connectivity from Python to Hive. 

from pyhive import hive
import logging
import os

logging.basicConfig(level=logging.DEBUG)
conn = hive.Connection(host="hs_host",port=10000,database='default')

cursor = conn.cursor()
cursor.execute("show databases")
res = cursor.fetchall()
print (res)


#connect = hive.Connection(host=db_host, port=db_port, username=db_user, password=db_password, database=db_name,auth='KERBEROS',kerberos_service_name='hive')

make sure to install the below dependencies before running the code:

pip3.8 install pyhive
pip3.8 install thrift
pip3.8 install thrift_sasl

I hope this helps! Cheers!