Member since
11-17-2021
1123
Posts
254
Kudos Received
29
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 541 | 11-05-2025 10:13 AM | |
| 326 | 10-16-2025 02:45 PM | |
| 650 | 10-06-2025 01:01 PM | |
| 576 | 09-24-2025 01:51 PM | |
| 475 | 08-04-2025 04:17 PM |
03-10-2025
05:27 AM
Try query below and let us know if it solves the problem and any other issues you are facing related select id, st.value street, ph.value phone
from table, table.result st, table.result ph
where st.key=’street’ and st.value='abc' and ph.key=’phone’ and ph.value='123'
... View more
03-07-2025
03:24 PM
@Maulz Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches. Prerequisites Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster. Default HiveServer2 port: 10000 (verify via Cloudera Manager). Python Environment: Python 3.6+ installed. Authentication: Know your authentication method: Username/password (non-secure). Kerberos (common in enterprise clusters). LDAP. Below is a detailed, step-by-step guide: 2. Install Required Python Libraries Use pip to install: pip install pyhive # Python interface for Hive pip install thrift # Thrift protocol support pip install sasl # SASL authentication (for Kerberos) pip install thrift-sasl # SASL wrapper for Thrift pip install pykerberos # Kerberos support (if needed) For JDBC-based connections (alternative method): pip install JayDeBeApi # JDBC bridge 3. Configure Cloudera/Hive Via Cloudera Manager: Enable HiveServer2 and ensure it’s running. Check HiveServer2 Port (default: 10000). If using Kerberos: Ensure Kerberos is configured in Cloudera. Export your Kerberos keytab kinit -kt <keytab_file> <principal> Connecting Python to Cloudera/Hue/Hive 1.Using PyHive it's a Python library specifically designed to work with Hive from pyhive import hive # Connect to Hive server conn = hive.Connection( host='cloudera_host_name', port=10000, # Default HiveServer2 port username='your_username', password='your_password', database='default', # Your database name auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup ) # Create a cursor cursor = conn.cursor() # Execute a query cursor.execute('SELECT * FROM your_table LIMIT 10') # Fetch results results = cursor.fetchall() print(results) # Close connections cursor.close() conn.close() 2. Using the Impala Connection If your Cloudera cluster uses Impala: from impala.dbapi import connect conn = connect( host='cloudera_host_name', port=21050, # Default Impala port user='your_username', password='your_password', database='default' # Your database name ) cursor = conn.cursor() cursor.execute('SELECT * FROM your_table LIMIT 10') results = cursor.fetchall() print(results) cursor.close() conn.close() 3. Integration with Hue Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration: Use Hue’s REST API to execute queries: import requests # Hue API endpoint (replace with your Hue server URL) url = "http://<hue_server>:8888/hue/notebook/api/execute/hive" headers = {"Content-Type": "application/json"} data = { "script": "SELECT * FROM my_table", "dialect": "hive" } response = requests.post( url, auth=('<hue_username>', '<hue_password>'), headers=headers, json=data ) print(response.json()) Troubleshooting Common Issues: Connection Refused: Verify HiveServer2 is running (netstat -tuln | grep 10000). Check firewall rules. Authentication Failures: For Kerberos: Ensure kinit succeeded. For LDAP: Validate credentials. Thrift Version Mismatch: Use Thrift v0.13.0 with Hive 3.x. Logs: Check HiveServer2 logs in Cloudera Manager (/var/log/hive). 4. Best Practices Use connection pooling for high-frequency queries. For Kerberos, automate ticket renewal with kinit cron jobs. Secure credentials using environment variables or Vault. Happy hadooping
... View more
03-04-2025
02:34 AM
Thank you @abdulpasithali @DianaTorres for your response. I really appreciate it! I'll take some time to review the details, and hopefully, I won’t have more questions. 😉 Thanks again for your help!
... View more
03-03-2025
03:07 PM
Here are some highlights from the month of January
Coming on March 13th: The Power of Metadata: Build an interoperable data estate you can control and trust. Register Now!
Check out the FY25 Cloudera Meetup Events Calendar for upcoming & past event details!
713 new support questions
8 new community articles
731 new members
Community Article
Author
Components/ Labels
Fully Private Agents with Cloudera's AI Inference Service and CrewAI
@shresh
Cloudera Data Science and Engineering Cloudera Machine Learning (CML)
Iceberg WAP – Failsafe ETL with Iceberg and CDE
@wdyson
Apache Iceberg Cloudera Data Engineering (CDE) Cloudera Data Warehouse (CDW)
Getting Started with Spark GraphFrames in Cloudera AI Workbench
@pauldefusco
Apache Spark Cloudera Data Platform (CDP) Cloudera Machine Learning (CML)
JupyterLab and Spark Connect Quickstart in Cloudera Data Engineering
Apache Spark Cloudera Data Engineering (CDE)
PyCharm and Spark Connect Quickstart in Cloudera Data Engineering
Apache Spark Cloudera Data Engineering (CDE)
We would like to recognize the below community members and employees for their efforts over the last month to provide community solutions.
See all our top participants at Top Solution Authors leaderboard and all the other leaderboards on our Leaderboards and Badges page.
@MattWho @venkatsambath @vaishaakb @vikas @james_jones @Seaport @raph @amd0629 @wkwi @tuyen123
Share your expertise and answer some of the below open questions. Also, be sure to bookmark the unanswered question page to find additional open questions.
Unanswered Community Post
Components/ Labels
External zookeeper and nifi cluster connection issue
Apache NiFi Apache Zookeeper
Parsing nested JSOn in HIVE
Apache Hive
Unable to run spark-shell command with k8s as master
Apache Spark
Apache Nifi IMAP/POP3 authentication Error
Apache NiFi
GenerateFlowFile "Failed to properly initialize Processor. If still scheduled to run"
Apache NiFi
... View more
03-03-2025
01:24 AM
Writing the output back into the database results in the same error. However, I checked the physical and logical plan of these operations and noticed that Spark does a "relation" operation for the second table that is over 900GB, reading all the columns within it instead of choosing the subset that is in the query. Thus, I translated the whole code into SQL and returned the table in dataframe format perfectly... Perhaps you have an idea why doesn't Spark push down the filtering and column prunning operations?
... View more
02-28-2025
09:04 AM
@ajay32 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
02-27-2025
08:09 AM
@JohnCarter Welcome to the Cloudera Community! To help you get the best possible solution, I have tagged our NiFi experts @SAMSAL @MattWho who may be able to assist you further. Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
... View more
02-25-2025
07:50 AM
@pablobhz Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
02-25-2025
07:47 AM
@Jaydeep Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
02-24-2025
08:16 PM
@thomasLeecooper As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.
... View more