Member since 
    
	
		
		
		02-10-2022
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1
            
            
                Post
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		02-10-2022
	
		
		12:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello,     I have a question regarding hive bucketed tables (bucketed only no partitions) optimization.     Now, I know that if we have table A and table B and we want to join them using A.COL1 and B.COL1 ( A.COL1 = B.COL1), we should bucket both table A and B on col1 into same no of buckets or its multiple.     But what if we have more than 2 tables ?     for example, I have table A and I want to join it on table B and table C.     table A is joined with table B using COL1 (A.COL1 = B.COL1) and table A is joined with table C using COL2 (A.COL2 = C.COL2 )     what columns should I cluster by for table A ??? is it bucket by COL1 and COL2 (clustered by col1,col2)?        in summary, how can I optimize one table if it is joined with more than one table using buckets only.     Thanks in advance.          
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
 
        