Member since
01-18-2016
169
Posts
32
Kudos Received
21
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1527 | 06-27-2025 06:00 AM | |
| 1277 | 01-14-2025 06:30 PM | |
| 1839 | 04-06-2018 09:24 PM | |
| 1970 | 05-02-2017 10:43 PM | |
| 5139 | 01-24-2017 08:21 PM |
09-04-2025
05:27 AM
hi @huimin, perfect! if you could describe the solution you implemented, it will help others who encounter the same issue. hugs.
... View more
07-24-2025
05:01 AM
SELECT * FROM your_table WHERE data_dt = '__HIVE_DEFAULT_PARTITION__'; ok thank's for your reply ,i understand this example But my question is, in my Third SQL statement, using length() function in the select statement is effective and the result is 26, while in the first and second SQL , why can't this record be filtered by length in the where statement when I use the length function on the partition key? Thank You
... View more
03-07-2025
03:24 PM
@Maulz Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches. Prerequisites Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster. Default HiveServer2 port: 10000 (verify via Cloudera Manager). Python Environment: Python 3.6+ installed. Authentication: Know your authentication method: Username/password (non-secure). Kerberos (common in enterprise clusters). LDAP. Below is a detailed, step-by-step guide: 2. Install Required Python Libraries Use pip to install: pip install pyhive # Python interface for Hive pip install thrift # Thrift protocol support pip install sasl # SASL authentication (for Kerberos) pip install thrift-sasl # SASL wrapper for Thrift pip install pykerberos # Kerberos support (if needed) For JDBC-based connections (alternative method): pip install JayDeBeApi # JDBC bridge 3. Configure Cloudera/Hive Via Cloudera Manager: Enable HiveServer2 and ensure it’s running. Check HiveServer2 Port (default: 10000). If using Kerberos: Ensure Kerberos is configured in Cloudera. Export your Kerberos keytab kinit -kt <keytab_file> <principal> Connecting Python to Cloudera/Hue/Hive 1.Using PyHive it's a Python library specifically designed to work with Hive from pyhive import hive # Connect to Hive server conn = hive.Connection( host='cloudera_host_name', port=10000, # Default HiveServer2 port username='your_username', password='your_password', database='default', # Your database name auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup ) # Create a cursor cursor = conn.cursor() # Execute a query cursor.execute('SELECT * FROM your_table LIMIT 10') # Fetch results results = cursor.fetchall() print(results) # Close connections cursor.close() conn.close() 2. Using the Impala Connection If your Cloudera cluster uses Impala: from impala.dbapi import connect conn = connect( host='cloudera_host_name', port=21050, # Default Impala port user='your_username', password='your_password', database='default' # Your database name ) cursor = conn.cursor() cursor.execute('SELECT * FROM your_table LIMIT 10') results = cursor.fetchall() print(results) cursor.close() conn.close() 3. Integration with Hue Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration: Use Hue’s REST API to execute queries: import requests # Hue API endpoint (replace with your Hue server URL) url = "http://<hue_server>:8888/hue/notebook/api/execute/hive" headers = {"Content-Type": "application/json"} data = { "script": "SELECT * FROM my_table", "dialect": "hive" } response = requests.post( url, auth=('<hue_username>', '<hue_password>'), headers=headers, json=data ) print(response.json()) Troubleshooting Common Issues: Connection Refused: Verify HiveServer2 is running (netstat -tuln | grep 10000). Check firewall rules. Authentication Failures: For Kerberos: Ensure kinit succeeded. For LDAP: Validate credentials. Thrift Version Mismatch: Use Thrift v0.13.0 with Hive 3.x. Logs: Check HiveServer2 logs in Cloudera Manager (/var/log/hive). 4. Best Practices Use connection pooling for high-frequency queries. For Kerberos, automate ticket renewal with kinit cron jobs. Secure credentials using environment variables or Vault. Happy hadooping
... View more
01-22-2025
05:52 PM
James, Thanks for your help. Your reply that "user is required on the active NN" is right to the point. SSSD is mentioned in various online documents related to enabling Kerberos. In my case, SSSD is a background process and I do not need to configure it, right? Best regards,
... View more
12-12-2024
09:40 AM
Though one can do the manual intervention to fix the under replicated blocks, HDFS has matured a lot and the NameNode will take care of fixing the under replicated blocks on its own. The drawback for doing the manual step is that it may add additional load to the NameNode Operations and may cause performance degradation with existing jobs. So if you plan to do manually you may do it at least business hours or over the weekend.
... View more
11-22-2024
10:25 AM
1 Kudo
@weixin As a test, try using curl and make sure you have a kerberos ticket: curl -u : --negotiate http://YOURHOST:PORT/jmx You may need to open a support case for this. I also highly recommend upgrading to CDP 7.1.9.
... View more
04-17-2024
06:35 AM
Hello @DianaTorres, I've tried to follow the instructions & looked around a few changes in our configuration, but I can't easily figure out the solution; it might be simple, but it's my first look / attempt at this setup. I've tried to create a Cloudera support case but couldn't, it seems I don't have the rights. Could you create one on my behalf ? As a reminder of the issue : - We are informing two realms in the conf file - When attempting a connection, and specifically with Hive where the error pops-up, it seems like the process only takes into account the default realm. - As we have two tickets simultaneously, one for each realm, the connection fails, likely because it does not find the right credentials of the ticket to the right domain.
... View more
02-28-2024
09:25 AM
Hi, Do you have a question? The HDP Sandbox is no longer available or supported.
... View more
01-19-2024
04:54 PM
2 Kudos
That's a lot of log. Some of the error messages you see are normal. I'm not sure what your issue is. Do you see Cloudera Management Service below the Cluster services in CM (at the very bottom when you click Cloudera Manager - top left)? If so, click Instances and figure out which components/roles are not started. You can also click and start them one by one. Then you can look at the startup logs in the CM UI pop-up after it starts or fails. Check in the order of STDOUT, STDERR and lastly ROLE LOG, which is the log after it is started. You may need to check the Full Log.
... View more
07-07-2020
04:34 AM
Solr includes the specified file terms in an index. Indexing in Solr would be similar to creating an index at the end of a book that includes the words that appear in that book and their location, so basically we would take an inventory of the words that appear in the book and an inventory of the pages where said words appear That is, by including content in the index, we make said content available for search by Solr. This type of index, called an inverted index, is a way of structuring the information that will be retrieved by a search engine. You may find a longer answer of the way the information is stored and retrieved by solr in https://www.solr-tutorial.com/indexing-with-solr.html
... View more