Member since 
    
	
		
		
		06-23-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                13
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		09-07-2017
	
		
		08:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Eugene Koifman That helped reduce the spilled_rows from 11 billion to 5 billion. I was under the impression that inserting data into a partition is faster with a distribute by. This was useful.   Also, I heard compressing the intermediary files helps reduce the spilled_rows. Is that correct?   set
mapreduce.map.output.compress = true  set
mapreduce.output.fileoutputformat.compress = true  Or anything else we can do to optimize the query? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-11-2017
	
		
		03:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Gaurav Mallikarjuna   In the above example you can notice that I used other method to connect to hiveserver2 - using hive2 node + its port number like  $ beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin
  Using admin is for my sample only.  In your case - if your transport mode is binary and the cluster is NON kerberized -   $ beeline -u "jdbc:hive2://<hiveserver2-hostname>:10000/" -n <username>
 
						
					
					... View more