Member since 
    
	
		
		
		01-18-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                169
            
            
                Posts
            
        
                32
            
            
                Kudos Received
            
        
                21
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1600 | 06-27-2025 06:00 AM | |
| 1313 | 01-14-2025 06:30 PM | |
| 1856 | 04-06-2018 09:24 PM | |
| 1999 | 05-02-2017 10:43 PM | |
| 5171 | 01-24-2017 08:21 PM | 
			
    
	
		
		
		09-04-2025
	
		
		05:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hi @huimin,  perfect!  if you could describe the solution you implemented, it will help others who encounter the same issue.  hugs. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-24-2025
	
		
		05:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 SELECT * FROM your_table WHERE data_dt = '__HIVE_DEFAULT_PARTITION__';  ok thank's for your reply ,i understand this example  But my question is, in my Third SQL statement, using length() function in the select statement is effective and the result is 26, while in the first and second  SQL , why can't this record be filtered by length in the where statement when I use the length function on the partition key?  Thank You  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-07-2025
	
		
		03:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Maulz   Connecting Python to Cloudera, Hive, and Hue involves using libraries and drivers that interface with HiveServer2  the service that allows remote clients to execute Hive queries.There are several methods to connect Python to Cloudera's ecosystem, particularly to access Hive tables through Hue. I'll detail the most common approaches.    Prerequisites   Cloudera/Hadoop Cluster: Ensure HiveServer2 is running on your cluster.  Default HiveServer2 port: 10000 (verify via Cloudera Manager).  Python Environment: Python 3.6+ installed.  Authentication: Know your authentication method:  Username/password (non-secure).  Kerberos (common in enterprise clusters).  LDAP.     Below is a detailed, step-by-step guide:  2. Install Required Python Libraries  Use pip to install:  pip install pyhive # Python interface for Hive  pip install thrift # Thrift protocol support  pip install sasl # SASL authentication (for Kerberos)  pip install thrift-sasl # SASL wrapper for Thrift  pip install pykerberos # Kerberos support (if needed)  For JDBC-based connections (alternative method):  pip install JayDeBeApi # JDBC bridge  3. Configure Cloudera/Hive  Via Cloudera Manager:   Enable HiveServer2 and ensure it’s running.  Check HiveServer2 Port (default: 10000).  If using Kerberos:    Ensure Kerberos is configured in Cloudera.  Export your Kerberos keytab   kinit -kt <keytab_file> <principal>  Connecting Python to Cloudera/Hue/Hive  1.Using PyHive it's a Python library specifically designed to work with Hive   from pyhive import hive                                             # Connect to Hive server  conn = hive.Connection(  host='cloudera_host_name',  port=10000,                         # Default HiveServer2 port  username='your_username',  password='your_password',  database='default',               # Your database name  auth='LDAP' # Or 'NONE', 'KERBEROS', 'CUSTOM' depending on your authentication setup  )  # Create a cursor  cursor = conn.cursor()  # Execute a query  cursor.execute('SELECT * FROM your_table LIMIT 10')  # Fetch results  results = cursor.fetchall()  print(results)  # Close connections  cursor.close()  conn.close()   2. Using the Impala Connection  If your Cloudera cluster uses Impala:   from impala.dbapi import connect  conn = connect(  host='cloudera_host_name',  port=21050, # Default Impala port  user='your_username',  password='your_password',  database='default' # Your database name  )  cursor = conn.cursor()  cursor.execute('SELECT * FROM your_table LIMIT 10')  results = cursor.fetchall()  print(results)  cursor.close()  conn.close()   3. Integration with Hue  Hue is a web UI for Hadoop, but you can programmatically interact with Hive via its APIs (limited). For direct Python-Hue integration:    Use Hue’s REST API to execute queries:         import requests  # Hue API endpoint (replace with your Hue server URL)  url = "http://<hue_server>:8888/hue/notebook/api/execute/hive"  headers = {"Content-Type": "application/json"}  data = {  "script": "SELECT * FROM my_table",  "dialect": "hive"  }  response = requests.post(  url,  auth=('<hue_username>', '<hue_password>'),  headers=headers,  json=data  )  print(response.json())       Troubleshooting    Common Issues:   Connection Refused:   Verify HiveServer2 is running (netstat -tuln | grep 10000).  Check firewall rules.   Authentication Failures:   For Kerberos: Ensure kinit succeeded.  For LDAP: Validate credentials.   Thrift Version Mismatch:   Use Thrift v0.13.0 with Hive 3.x.    Logs:   Check HiveServer2 logs in Cloudera Manager (/var/log/hive).     4. Best Practices   Use connection pooling for high-frequency queries.  For Kerberos, automate ticket renewal with kinit cron jobs.  Secure credentials using environment variables or Vault.   Happy hadooping 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-22-2025
	
		
		05:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 James,  Thanks for your help. Your reply that "user is required on the active NN" is right to the point.   SSSD is mentioned in various online documents related to enabling Kerberos. In my case, SSSD is a background process and I do not need to configure it, right?  Best regards, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-12-2024
	
		
		09:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Though one can do the manual intervention to fix the under replicated blocks, HDFS has matured a lot and  the NameNode will take care of fixing the under replicated blocks on its own. The drawback for doing the manual step is that it may add additional load to the NameNode Operations and may cause performance degradation with existing jobs. So if you plan to do manually you may do it at least business hours or over the weekend. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-22-2024
	
		
		10:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @weixin   As a test, try using curl and make sure you have a kerberos ticket:  curl -u : --negotiate http://YOURHOST:PORT/jmx    You may need to open a support case for this. I also highly recommend upgrading to CDP 7.1.9. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-17-2024
	
		
		06:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @DianaTorres, I've tried to follow the instructions & looked around a few changes in our configuration, but I can't easily figure out the solution; it might be simple, but it's my first look / attempt at this setup. I've tried to create a Cloudera support case but couldn't, it seems I don't have the rights. Could you create one on my behalf ? As a reminder of the issue :  - We are informing two realms in the conf file  - When attempting a connection, and specifically with Hive where the error pops-up, it seems like the process only takes into account the default realm.  - As we have two tickets simultaneously, one for each realm, the connection fails, likely because it does not find the right credentials of the ticket to the right domain. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-28-2024
	
		
		09:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  Do you have a question? The HDP Sandbox is no longer available or supported. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-19-2024
	
		
		04:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 That's a lot of log. Some of the error messages you see are normal. I'm not sure what your issue is. Do you see Cloudera Management Service below the Cluster services in CM (at the very bottom when you click Cloudera Manager - top left)? If so, click Instances and figure out which components/roles are not started. You can also click and start them one by one. Then you can look at the startup logs in the CM UI pop-up after it starts or fails. Check in the order of STDOUT, STDERR and lastly ROLE LOG, which is the log after it is started. You may need to check the Full Log. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2020
	
		
		04:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Solr includes the specified file terms in an index.  Indexing in Solr would be similar to creating an index at the end of a book that includes the words that appear in that book and their location, so basically we would take an inventory of the words that appear in the book and an inventory of the pages where said words appear  That is, by including content in the index, we make said content available for search by Solr.  This type of index, called an inverted index, is a way of structuring the information that will be retrieved by a search engine.      You may find a longer answer of the way the information is stored and retrieved by solr in https://www.solr-tutorial.com/indexing-with-solr.html    
						
					
					... View more