Member since 
    
	
		
		
		09-25-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1
            
            
                Post
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		09-25-2016
	
		
		05:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 In How-to: Tune Your Apache Spark Jobs (Part 1), the following tip is given:  Avoid the flatMap-join-groupBy pattern. When two datasets are already grouped by key and you want to join them and keep them grouped, you can just use cogroup. That avoids all the overhead associated with unpacking and repacking the groups.  Why can't I just apply a join directly on the rdd's that are already grouped by? Wouldn't that yield the same result as in cogroup? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
 
        
