About james_jones

yagoaparecidoti · ‎09-04-2025

hi @huimin, perfect! if you could describe the solution you implemented, it will help others who encounter the same issue. hugs.

lingloong · ‎07-24-2025

SELECT * FROM your_table WHERE data_dt = '__HIVE_DEFAULT_PARTITION__'; ok thank's for your reply ,i understand this example But my question is, in my Third SQL statement, using length() function in the select statement is effective and the result is 26, while in the first and second SQL , why can't this record be filtered by length in the where statement when I use the length function on the partition key? Thank You

Shelton · ‎03-07-2025

@Maulz Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2 the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches. Prerequisites Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster. Default HiveServer2 port: 10000 (verify via Cloudera Manager). Python Environment: Python 3.6+ installed. Authentication: Know your authentication method: Username/password (non-secure). Kerberos (common in enterprise clusters). LDAP. Below is a detailed, step-by-step guide: 2. Install Required Python Libraries Use pip to install: pip install pyhive # Python interface for Hive pip install thrift # Thrift protocol support pip install sasl # SASL authentication (for Kerberos) pip install thrift-sasl # SASL wrapper for Thrift pip install pykerberos # Kerberos support (if needed) For JDBC-based connections (alternative method): pip install JayDeBeApi # JDBC bridge 3. Configure Cloudera/Hive Via Cloudera Manager: Enable HiveServer2 and ensure it’s running. Check HiveServer2 Port (default: 10000). If using Kerberos: Ensure Kerberos is configured in Cloudera. Export your Kerberos keytab kinit -kt <keytab_file> <principal> Connecting Python to Cloudera/Hue/Hive 1.Using PyHive it's a Python library specifically designed to work with Hive from pyhive import hive # Connect to Hive server conn = hive.Connection( host='cloudera_host_name', port=10000, # Default HiveServer2 port username='your_username', password='your_password', database='default', # Your database name auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup ) # Create a cursor cursor = conn.cursor() # Execute a query cursor.execute('SELECT * FROM your_table LIMIT 10') # Fetch results results = cursor.fetchall() print(results) # Close connections cursor.close() conn.close() 2. Using the Impala Connection If your Cloudera cluster uses Impala: from impala.dbapi import connect conn = connect( host='cloudera_host_name', port=21050, # Default Impala port user='your_username', password='your_password', database='default' # Your database name ) cursor = conn.cursor() cursor.execute('SELECT * FROM your_table LIMIT 10') results = cursor.fetchall() print(results) cursor.close() conn.close() 3. Integration with Hue Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration: Use Hue’s REST API to execute queries: import requests # Hue API endpoint (replace with your Hue server URL) url = "http://<hue_server>:8888/hue/notebook/api/execute/hive" headers = {"Content-Type": "application/json"} data = { "script": "SELECT * FROM my_table", "dialect": "hive" } response = requests.post( url, auth=('<hue_username>', '<hue_password>'), headers=headers, json=data ) print(response.json()) Troubleshooting Common Issues: Connection Refused: Verify HiveServer2 is running (netstat -tuln | grep 10000). Check firewall rules. Authentication Failures: For Kerberos: Ensure kinit succeeded. For LDAP: Validate credentials. Thrift Version Mismatch: Use Thrift v0.13.0 with Hive 3.x. Logs: Check HiveServer2 logs in Cloudera Manager (/var/log/hive). 4. Best Practices Use connection pooling for high-frequency queries. For Kerberos, automate ticket renewal with kinit cron jobs. Secure credentials using environment variables or Vault. Happy hadooping

Seaport · ‎01-22-2025

James, Thanks for your help. Your reply that "user is required on the active NN" is right to the point. SSSD is mentioned in various online documents related to enabling Kerberos. In my case, SSSD is a background process and I do not need to configure it, right? Best regards,

sathishkr · ‎12-12-2024

Though one can do the manual intervention to fix the under replicated blocks, HDFS has matured a lot and the NameNode will take care of fixing the under replicated blocks on its own. The drawback for doing the manual step is that it may add additional load to the NameNode Operations and may cause performance degradation with existing jobs. So if you plan to do manually you may do it at least business hours or over the weekend.

james_jones · ‎11-22-2024

@weixin As a test, try using curl and make sure you have a kerberos ticket: curl -u : --negotiate http://YOURHOST:PORT/jmx You may need to open a support case for this. I also highly recommend upgrading to CDP 7.1.9.

dqsdqs · ‎04-17-2024

Hello @DianaTorres, I've tried to follow the instructions & looked around a few changes in our configuration, but I can't easily figure out the solution; it might be simple, but it's my first look / attempt at this setup. I've tried to create a Cloudera support case but couldn't, it seems I don't have the rights. Could you create one on my behalf ? As a reminder of the issue : - We are informing two realms in the conf file - When attempting a connection, and specifically with Hive where the error pops-up, it seems like the process only takes into account the default realm. - As we have two tickets simultaneously, one for each realm, the connection fails, likely because it does not find the right credentials of the ticket to the right domain.

james_jones · ‎02-28-2024

Hi, Do you have a question? The HDP Sandbox is no longer available or supported.

james_jones · ‎01-19-2024

That's a lot of log. Some of the error messages you see are normal. I'm not sure what your issue is. Do you see Cloudera Management Service below the Cluster services in CM (at the very bottom when you click Cloudera Manager - top left)? If so, click Instances and figure out which components/roles are not started. You can also click and start them one by one. Then you can look at the startup logs in the CM UI pop-up after it starts or fails. Check in the order of STDOUT, STDERR and lastly ROLE LOG, which is the log after it is started. You may need to check the Full Log.

lusitez · ‎07-07-2020

Solr includes the specified file terms in an index. Indexing in Solr would be similar to creating an index at the end of a book that includes the words that appear in that book and their location, so basically we would take an inventory of the words that appear in the book and an inventory of the pages where said words appear That is, by including content in the index, we make said content available for search by Solr. This type of index, called an inverted index, is a way of structuring the information that will be retrieved by a search engine. You may find a longer answer of the way the information is stored and retrieved by solr in https://www.solr-tutorial.com/indexing-with-solr.html

Online	Offline
Last Visited	‎09-25-2025 01:25 PM

Member Since	‎01-18-2016 02:01 PM
Last Visited	‎09-25-2025 01:25 PM
Posts	169
Kudos received	31

Cloudera Community

Re: Connect Trino to Cloudera Hive with Kerberos A...

Re: How do HDFS Permissions work after Kerberos is...

Re: Ambari SPN creation on remote AD

Re: Solr on HDF

Re: Wrong timezone in Ranger admin

Re: Connect Trino to Cloudera Hive with Kerberos A...

Re: search nothing about '__HIVE_DEFAULT_PARTIT...

Re: How to connect hive using Python for pytest da...

Re: How do HDFS Permissions work after Kerberos is...

Re: Fix Under-replicated blocks in HDFS manually

Re: CDH6.2.1 add Kerberos , hive server2 jmx thro...

Re: Connection to Hive & Impala - Kerberos Authent...

Re: hdp sandbox

Re: Getting Error while running the ansible play b...

Re: SOLR - how to use it