Member since 
    
	
		
		
		06-09-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                529
            
            
                Posts
            
        
                129
            
            
                Kudos Received
            
        
                104
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1737 | 09-11-2019 10:19 AM | |
| 9342 | 11-26-2018 07:04 PM | |
| 2490 | 11-14-2018 12:10 PM | |
| 5339 | 11-14-2018 12:09 PM | |
| 3154 | 11-12-2018 01:19 PM | 
			
    
	
		
		
		08-17-2018
	
		
		11:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Berry Österlund   I think the problem might be related to some missing configurations, please check you have set all as per:  https://github.com/hortonworks-spark/spark-llap  HTH 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-14-2018
	
		
		12:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Mark One last thing, you may want to reconsider the saving of the files ever minute. If files are small then you will endup causing a problem to HDFS Namenode in long term. This is a known issue:  https://community.hortonworks.com/questions/167615/what-is-small-file-problem-in-hdfs.html  We recommend to avoid writing lots of small files but rather trying to keep them at least the size of hdfs block.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-14-2018
	
		
		12:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Mark Sorry, I just realized you were looking for pyspark solution and I provided the scala references instead. Everything I mentioned above also applies to pyspark, and the DataFrameWriter api link is here:  https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter  HTH 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-14-2018
	
		
		12:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Mark,  Here are my suggestions:  1. Previous to saving the rdd, I recommend you transform it to a dataframe and use the dataframewriter:  https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.sql.DataFrameWriter  As per your requirement to avoid the directory and part file names, I believe this is not possible out of the box. You can write a single part file but the directory will be created by default. You can read more here:  https://community.hortonworks.com/questions/142479/pyspark-creating-directory-when-trying-to-rdd-as-s.html  One of the possible solutions is to write to a temporary directory and then move the single file renaming it to the appropriate folder. You can have a single file created inside the temporary directory by using the coalesce method like this:  df.coalesce(1).write.format("json").mode("overwrite").save("temp_dir/test.json")  2. For saving json to orc hive table unless you plan to store it as string column you will need to parse the json and use flatmap to get correct columns you like to store. You can review the dataframewriter api saveAsTable method and example:  https://spark.apache.org/docs/2.3.1/api/scala/index.html#org.apache.spark.sql.DataFrameWriter@saveAsTable(tableName:String):Unit  And also check out this article that shows how to append to an orc table:  http://jugsi.blogspot.com/2017/12/append-data-with-spark-to-hive-oarquet.html  As always if you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-13-2018
	
		
		06:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @harish please keep me posted, did the above helped? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-13-2018
	
		
		06:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Is Ta  Please let me know if the above has helped you?  Thanks! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-13-2018
	
		
		06:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Harun
 Zengin please let me know if the above has helped you? Thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-13-2018
	
		
		06:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Girish Khole, did the above helped? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-10-2018
	
		
		01:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Takefumi Oide only 1 authentication mechanism per hiveserver2 - sorry for the confusion, I did copy pasted 🙂  - So that would be 4 hiveserver2 with single authentication mechanisms each.  
						
					
					... View more