Member since 
    
	
		
		
		04-24-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                4
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		04-29-2019
	
		
		01:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Solved it by replacing the "USING HIVE OPTIONS (fileFormat 'ORC')"-clause with "USING ORC"-clause. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-26-2019
	
		
		03:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I 'm trying to create an ORC-table, which can store orc-files as 'native' OrcFileFormat. Unfortunately, I 'm having an issue with using partitions and saving orc-files as 'native' OrcFileFormat.      With Partitioning  1. For partitioning, I'm using the 'PARTITION BY COLUMN' clause. This results into the following query:  CREATE TABLE livedm.events  (
     day DATE,
     datetime TIMESTAMP,
     description STRING,
     eventtype STRING
)
USING HIVE OPTIONS (fileFormat 'ORC')
PARTITIONED BY (eventtype )      2. Further on, I'm using (Py)Spark to insert data into the table. In the spark context, I added some configurations, such as spark.sql.orc.impl=native.  # Create SparkContext with configurations for new OrcFormat
session_builder = pyspark.sql.SparkSession.builder \
    .enableHiveSupport() \
    .appName("test-orc") \
    .config("spark.sql.hive.convertMetastoreOrc", "true") \
    .config("spark.sql.orc.cache.stripe.details.size", "10000") \
    .config("spark.sql.orc.enabled", "true") \
    .config("spark.sql.orc.filterPushdown", "true") \
    .config("spark.sql.orc.splits.include.file.footer", "true") \
    .config("spark.sql.hive.metastorePartitionPruning", "true") \
    .config("spark.sql.orc.impl","native") \
    .config("spark.sql.orc.enableVectorizedReader", "true")   
spark = session_builder.getOrCreate()
# event_df is a dataframe with eventdata
event_df = spark.sql("select * from tempTable")
event_df.write.mode("append").format("org.apache.spark.sql.execution.datasources.orc").insertInto("livedm.events")      3. Next, I'm using the following command to check the OrcFileFormat:  %sh
java -jar /opt/orc-test/orc-tools-1.4.3-uber.jar meta hdfs://root/apps/hive/warehouse/livedm.db/events 2> /dev/null | grep Version | head -n1  Which results into:  File Version: 0.12 with HIVE_8732  Unfortunately, orc-files have been stored as 'hive' orcFileFormat      Without Partitioning  When I 'm repeating the earlier steps, without using any partitioning on step 1, the orc-files are successfully being stored as 'native' orcFileFormat. By using the command from step 3, the following are being returned:  File Version: 0.12 with ORC_135      Question  How can I use partitioning (properly) when creating an Orc-table, and store data as 'native ' orcFileFormat? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
 
        





