Support Questions

Maulz · ‎02-28-2025

Hello Everyone,

I am a Data Quality professional at a company, and currently, I am developing automated data testing. However, at this initial stage, I am facing difficulties in connecting Python to Cloudera/Hue/Hive.

In manual scenarios, I usually connect to Cloudera/Hue through a web portal like this: https://demo.gethue.com/hue/editor/?type=hive.

Is there anyone here who can assist me?

Thanks you

DianaTorres · ‎02-28-2025

@Maulz Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Hive experts @james_jones @ggangadharan who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

james_jones · ‎02-28-2025

@Maulz - Check this document for how to query hive from python3. You don't need Hue for this. https://docs.cloudera.com/cdsw/1.10.5/import-data/topics/cdsw-accessing-data-from-apache-hive.html

This example is for using Kerberos, but you can change or remove the authentication settings, depending on your authentication requiremnts.

Maulz · ‎03-02-2025

@james_jones

I understand what you mean, but can I access it using Python with the LDAP method like Hue?

My reason Since using Kerberos, I encounter issues when creating a ticket in MIT Kerberos that conflicts with my Microsoft AD user. So, I am looking for an alternative option without using Kerberos MIT.

Is it possible?

james_jones · ‎03-03-2025

@Maulz - You can use Knox as a proxy through the cdp-proxy-api topology to connect to Hive or Impala with basic authentication (username/password) like Hue. Using "cdp-proxy-api" assumes Knox is configured for basic authentication instead of SAML or JWT, etc. If it's not, you can manually create a new topology for basic authentication with hive. Here's how to enable Knox to expose Hive with the cdp-proxy-api if it's not configured yet (perhaps select "all" for transport mode:

https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/securing-hive/topics/hive_secure_knox.html

It sounds like you have both AD and MIT KDC. If your user you want to use hive with is in the MIT KDC realm, not AD, you can use a different krb5.conf file, or you can set up a on-way trust between AD and MIT KDC.

You can create your own krb5.conf configured for MIT KDC (or even configured for the one-way-trust, but the trust has to be established between the AD and MIT KDCs).
export KRB5_CONFIG=/path/to/your/custom_krb5.conf

You can use the default ticket cache file in the krb5.conf
default_ccache_name = FILE:/tmp/krb5cc_%{uid}
or set it as an environment variable (KRB5CCNAME).
Of course you'll need these set for python as well.

DianaTorres · ‎03-07-2025

@Maulz Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Regards,

Diana Torres,
Community Moderator

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Shelton · ‎03-07-2025

@Maulz
Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches.

Prerequisites

Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster.
Default HiveServer2 port: 10000 (verify via Cloudera Manager).
Python Environment: Python 3.6+ installed.
Authentication: Know your authentication method:
- Username/password (non-secure).
- Kerberos (common in enterprise clusters).
- LDAP.

Below is a detailed, step-by-step guide:

2. Install Required Python Libraries

Use pip to install:

Spoiler

pip install pyhive # Python interface for Hive
pip install thrift # Thrift protocol support
pip install sasl # SASL authentication (for Kerberos)
pip install thrift-sasl # SASL wrapper for Thrift
pip install pykerberos # Kerberos support (if needed)

pip install pyhive # Python interface for Hivepip install thrift # Thrift protocol supportpip install sasl # SASL authentication (for Kerberos)pip install thrift-sasl # SASL wrapper for Thriftpip install pykerberos # Kerberos support (if needed)

For JDBC-based connections (alternative method):

Spoiler

pip install JayDeBeApi # JDBC bridge

3. Configure Cloudera/Hive

Via Cloudera Manager:

Enable HiveServer2 and ensure it’s running.
Check HiveServer2 Port (default: 10000).
If using Kerberos:

Ensure Kerberos is configured in Cloudera.
Export your Kerberos keytab

Spoiler

kinit -kt <keytab_file> <principal>

Connecting Python to Cloudera/Hue/Hive

1.Using PyHive it's a Python library specifically designed to work with Hive

Spoiler

from pyhive import hive
# Connect to Hive server
conn = hive.Connection(
host='cloudera_host_name',
port=10000, # Default HiveServer2 port
username='your_username',
password='your_password',
database='default', # Your database name
auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup
)
# Create a cursor
cursor = conn.cursor()
# Execute a query
cursor.execute('SELECT * FROM your_table LIMIT 10')
# Fetch results
results = cursor.fetchall()
print(results)
# Close connections
cursor.close()
conn.close()

from pyhive import hive # Connect to Hive serverconn = hive.Connection(host='cloudera_host_name',port=10000, # Default HiveServer2 portusername='your_username',password='your_password',database='default', # Your database nameauth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup)# Create a cursorcursor = conn.cursor()# Execute a querycursor.execute('SELECT * FROM your_table LIMIT 10')# Fetch resultsresults = cursor.fetchall()print(results)# Close connectionscursor.close()conn.close()

2. Using the Impala Connection

If your Cloudera cluster uses Impala:

Spoiler

from impala.dbapi import connect
conn = connect(
host='cloudera_host_name',
port=21050, # Default Impala port
user='your_username',
password='your_password',
database='default' # Your database name
)
cursor = conn.cursor()
cursor.execute('SELECT * FROM your_table LIMIT 10')
results = cursor.fetchall()
print(results)

cursor.close()
conn.close()

from impala.dbapi import connectconn = connect(host='cloudera_host_name',port=21050, # Default Impala portuser='your_username',password='your_password',database='default' # Your database name)cursor = conn.cursor()cursor.execute('SELECT * FROM your_table LIMIT 10')results = cursor.fetchall()print(results)cursor.close()conn.close()

3. Integration with Hue

Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration:

Use Hue’s REST API to execute queries:

Troubleshooting

Common Issues:
- Connection Refused:
- Authentication Failures:
- Thrift Version Mismatch:
Logs:
- Check HiveServer2 logs in Cloudera Manager (/var/log/hive).

4. Best Practices

Use connection pooling for high-frequency queries.
For Kerberos, automate ticket renewal with kinit cron jobs.
Secure credentials using environment variables or Vault.

Happy hadooping

Cloudera Community

Support Questions

How to connect hive using Python for pytest databases

Prerequisites

2. Install Required Python Libraries

3. Configure Cloudera/Hive

Via Cloudera Manager:

2. Using the Impala Connection

Common Issues:

4. Best Practices

Connect to CDP DataHub Hive using Cloudera ODBC Dr...

How to create and connect to a dedicated Hive HS2 ...

Hive connection failed while connecting using jdbc...

Partner Tools Series: How to connect to CDW Hive u...

How to connect Go Applications to Cloudera Operati...

CML - Custom Runtime - VsCode using Python 3.10

Using phoenixdb to connect Cloudera Machine Learni...

Importing additional python modules while making u...

Connecting to Hive Thrift Server on Hortonworks us...

Connecting DBeaver to a CDP Hive Virtual Warehouse