Member since 
    
	
		
		
		06-13-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                25
            
            
                Posts
            
        
                3
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		08-30-2017
	
		
		12:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 hard to tell based on the information you provided but see if you can increase Pentaho's memory settings (edit spoon.bat).     If that doesn't work, check Impala's catalog'd memory setting.     Hope this helps.       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-30-2017
	
		
		12:22 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 there is a uuid function in impala that you can use to generate surrogate keys for kudu. or you can write an impala udf to generate unique bigints. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2017
	
		
		05:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							The best way to deal with small files is to not have to deal with them at all. You might want to explore using Kudu or HBase as your storage engine instead of HDFS (Parquet).
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-16-2017
	
		
		12:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							if writing to parquet you just have to do something like: df.write.mode("append").parquet("/user/hive/warehouse/Mytable") and if you want to prevent the "small file" problem: df.coalesce(1).write.mode("append").parquet("/user/hive/warehouse/Mytable")
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-15-2017
	
		
		05:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							You need to configure NTP correctly. "Four NTP servers is the recommended minimum. Four servers protects against one incorrect timesource, or "falseticker". " for tips in configuring NTP. https://access.redhat.com/solutions/58025
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2017
	
		
		06:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							does it have to be a sequence? or would a unique value be sufficient? If that's the case Impala's got a uuid() function that you can use. Or if a BIGINT is required you can hash the uuid() to get a BIGINT value.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2017
	
		
		12:04 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							You might have to include your GPFS libraries to your SPARK_CLASSPATH and LD_LIBRARY_PATH
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-13-2017
	
		
		11:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							One way is to use selectExpr and use cast. val ConvertedDF = joined.selectExpr("id","cast(mydoublecol as double) mydoublecol");
						
					
					... View more