Member since 
    
	
		
		
		04-08-2014
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                70
            
            
                Posts
            
        
                20
            
            
                Kudos Received
            
        
                12
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 8774 | 07-16-2018 04:12 PM | |
| 8889 | 07-13-2018 03:17 PM | |
| 9051 | 07-10-2018 03:00 PM | |
| 9063 | 07-10-2018 02:54 PM | |
| 9306 | 07-05-2018 03:35 PM | 
			
    
	
		
		
		04-25-2019
	
		
		12:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Let's try to rule out various types of problems.     1. Are you able to read/write to Kerberos-enabled HDFS with PySpark? Is Kudu the only Kerberos-enabled service that is not working from within PySpark?     2. Have you checked to ensure that the Spark driver is running on the host and shell you kinited from instead of being started in a YARN container? If it's running in YARN you have to give YARN access to the keytab to run as.     3. Have you tried connecting to Kudu with the regular Spark shell? Does it work? For examples see https://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-23-2019
	
		
		05:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Is your cluster Kerberos-enabled? If so, did you kinit before running the job? Try a local driver before trying a distributed driver to rule out keytab-related issues. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-16-2019
	
		
		08:55 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Kudu runs as a separate service that Impala talks to (like HDFS runs as a separate service from Impala) so you have to have Kudu running somewhere for it to work. However you don't have to run Kudu on the same servers that you run Impala on -- remote reads are supported over the network. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-16-2019
	
		
		08:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 EricL is correct, you don't need to worry about files with Kudu in the same way that you have to worry about them with typical Hive tables. Kudu stores its data directly on ext4 in a distributed way and does not use HDFS.     You can take a look at where Kudu is storing its data on the local file system if you go into Cloudera manager and take a look at how the --fs-data-dirs and --fs-wal-dir configuration options are set up across the various Tablet Server nodes.     Hope that helps,  Mike    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-27-2018
	
		
		01:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 A WAL file is a Kudu tablet write-ahead log file. You can read an overview of how the Kudu write path works here (it's a fairly techincal blog post): https://blog.cloudera.com/blog/2017/04/apache-kudu-read-write-paths/     The WAL file location is controlled by the configuration parameter --fs_wal_dir which you can read about at https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_wal_dir 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-28-2018
	
		
		11:36 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 FYI, I think my reply from 9/21 was wrong. As far as I can tell, the rules work as follows:     1. If the Kudu table is managed by Impala, it's not possible to change the kudu.table_name and kudu.master_addresses properties. This is the case when it's not an EXTERNAL table. See https://issues.apache.org/jira/browse/IMPALA-5654 for more information on that.     I have filed an improvement request to track automatically renaming the Kudu table when the Impala table is renamed to keep them in sync, but right now it's not possible. See https://issues.apache.org/jira/browse/IMPALA-7640 for more information.     2. If you have an EXTERNAL table (Kudu table not managed by Impala) then you are able to alter the kudu.table_name table property.     The above was tested on a non-secure cluster, and I would be interested to hear if others' experiences are the same as mine were even on a secured cluster. However I believe the behavior is the same in both cases.     Hope this helps,  Mike    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-25-2018
	
		
		04:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Just following up here, I just tested this on Impala version 2.13 (dev version) and I cannot reproduce the ability to alter table set tblproperties to rename the Kudu table name even after altering the Impala table name. Is anyone else able to reproduce this? I get the following error:     > alter table mpercy_k2 set tblproperties('kudu.table_name'='impala::default.mpercy_k2');  Query: alter table mpercy_k2 set tblproperties('kudu.table_name'='impala::default.mpercy_k2')  ERROR: AnalysisException: Not allowed to set 'kudu.table_name' manually for managed Kudu tables .     However this is by design from what I have discussed with some others.     I think the "bug" is that Impala alter table doesn't automatically rename the Kudu table internally. However it would be a security problem to be able to alter the kudu table name with tblproperties because Sentry applies the security rules to the Impala table name. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-21-2018
	
		
		06:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Ankit_Mishra's answer is the correct way to do the procedure you want to do, Impala doesn't allow for separately managing the Kudu and Impala tables if you create the Kudu table through Impala. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-21-2018
	
		
		06:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Andreyeff Another thing you can try doing is increasing the raft heartbeat interval from 500ms to 1500ms or even 3000ms, see https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_raft_heartbeat_interval_ms     This will affect your recovery time by a few seconds if a leader fails since by default, elections don't happen for 3 missed heartbeat periods (controlled by https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_leader_failure_max_missed_heartbeat_periods ) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-18-2018
	
		
		09:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Frankly it sounds like you should revisit your capacity planning.     You can try bumping up raft consensus timeouts and upgrading to the latest version of Kudu but it may not help that much. 
						
					
					... View more