Member since 
    
	
		
		
		10-12-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                4
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		01-31-2018
	
		
		09:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Dongjoon Hyun Just want to check if the ORC library version change i.e to ORC 1.4.1 is getting picked or not as part of Spark 2.3 release, I have gone through the PR's under SPARK-20901, but I didn't find any conversation related to ORC library upgrade   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-16-2018
	
		
		06:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Thanks for the update, Vectorisation support is one other feature we have been looking for so long 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-16-2018
	
		
		06:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks Dongjoon for the reply. But what about the people who doesn't use HDP? Is there any open JIRA where some one is working on integrating latest version of Hive with Spark , if you are aware of any such thread , can you please share that link ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-15-2018
	
		
		07:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							    
        1
        down vote
        favorite
          
     We use Spark to flatten out clickstream data and then write the same 
to S3 in ORC+zlib format, I have tried changing many settings in Spark 
but still the resultant stripe sizes of the ORC files getting created are
 very small (<2MB) 
 Things which I tried so far to decrease the stripe size, 
 Earlier each file was 20MB in size, using coalesce I am now creating 
files which are of 250-300MB in size, but still there are 200 stripes 
per file i.e each stripe <2MB in size 
 Tried using hivecontext instead of sparkcontext by setting 
hive.exec.orc.default.stripe.size to 67108864, but spark isn't honoring 
these parameters. 
 So, Any idea on how can I increase the stripe sizes of ORC files 
being created ? because the problem with small stripes is , when we are 
querying these ORC files using Presto and when stripe size is less than 
8MB, then Presto will read the whole data file instead of the selected 
fields in the query. 
 Presto Stripe issue related thread: https://groups.google.com/forum/#!topic/presto-users/7NcrFvGpPaA     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark