Member since 
    
	
		
		
		12-27-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                73
            
            
                Posts
            
        
                34
            
            
                Kudos Received
            
        
                5
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 27207 | 03-23-2018 09:21 PM | |
| 2824 | 02-05-2018 07:08 PM | |
| 10287 | 01-15-2018 07:21 PM | |
| 2646 | 12-01-2017 06:35 PM | |
| 6604 | 03-09-2017 06:21 PM | 
			
    
	
		
		
		08-24-2018
	
		
		08:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	@Manikandan Jeyabal. Are you using the official Apache Spark? New ORC vectorized reader is added at Apache Spark 2.3.0. Please see SPARK-16060. 
  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-26-2018
	
		
		01:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Great! Thank you for sharing your experience too. Your summary and 
understanding is correct.  For Hive, since Hive 1.2.1 ORC writer and reader is too 
old, so it has some bugs of course. In general, it will read a 
new data correctly. For the best performance and safety, the latest Hive
 is recommended. Hive 2.3.0 starts to use Apache ORC.
  For Apache ORC library, Apache 
Spark 2.3 was released with Apache ORC 1.4.1 due to some reasons. Please
 use with the latest one, Apache ORC 1.4.3, if possible. There is a known issue, SPARK-23340. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-24-2018
	
		
		01:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Oh, is it? I'll try to reproduce your situation. Could you share more information about your sw stack? Apache Spark 2.3 on Hadoop 2.7 and Kafka?  Could you confirm that you are using new OrcFileFormat by setting `spark.sql.orc.impl=native`? The above bugs are fixed on new OrcFileFormat only. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-23-2018
	
		
		09:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Although it seems that you are hitting output format issue, ORC is tested properly after SPARK-22781.  As one example, `FileNotFoundException` might occur because of empty dataframe. (SPARK-15474)    There are more ORC issue before Apache Spark 2.3. Please see SPARK-20901 for the full list.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-23-2018
	
		
		09:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi, @Sanjay Gurnani    Officially, Apache Spark 2.2.1 Structured Streaming document doesn't mention ORC properly. Apache Spark 2.3 document starts to include ORC.   - http://spark.apache.org/docs/2.2.1/structured-streaming-programming-guide.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-27-2018
	
		
		04:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Hi, @prasad raju  
	Unfortunately, ORC doesn't support BZip2, so Hive and Spark doesn't.  
	- ORC Source Code  
	- HIVE-5067 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2018
	
		
		04:45 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you for confirming. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-11-2018
	
		
		09:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi, @Mai Nakagawa   You are using a mismatched jar file as you saw in your first exception message.  because LLAP or Hive classes are not found.  This document is about HDP 2.6.1 using Spark 2.1.1.  Since HDP 2.6.3, `spark-llap` for Spark 2.2 is built-in. Please use it.  $ ls -al /usr/hdp/2.6.3.0-235/spark_llap/spark-llap-assembly-1.0.0.2.6.3.0-235.jar  -rw-r--r-- 1 root root 61306448 Oct 30 02:39 /usr/hdp/2.6.3.0-235/spark_llap/spark-llap-assembly-1.0.0.2.6.3.0-235.jar 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-06-2018
	
		
		05:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It's a memory size for Spark executor (worker). And, there is additional overhead in Spark executor. You need to set a proper value by yourself. Of course, in YARN environment, the memory (+ overhead) should be smaller than the limitation of YARN container. So, Spark shows you the error message.  It's an application property. For normal Spark jobs, users are responsible because each app can set their `spark.executor.memory` with `spark-submit`. For Spark Thrift Server, admins should manage that properly when they adjust YARN configuration.  For more information, please see this. http://spark.apache.org/docs/latest/configuration.html#application-properties 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2018
	
		
		07:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi, @Michael Bronson   `spark.executor.memory` seems to be 10240.  Please change it in your Ambari, `spark-thrift-conf`. 
						
					
					... View more