Member since 
    
	
		
		
		07-29-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                10
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		08-28-2018
	
		
		11:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  I am joining two tables. One table is skewed. How to handle this in spark SQL. I am using spark 2.2.1 in AWS EMR.   Please assist on this. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		02-04-2018
	
		
		07:22 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Shu  Thanks a lot for the answer.  In my case the non group by columns are string data types. Can I use non group by columns that are string data types in the aggregation function?   Can I create temp view on the data frame and then use subquery to retrieve the results? Is this possible in structured streaming?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-04-2018
	
		
		05:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi Shu,  @shu  I have few other columns apart from the ROW_ID,ODS_WII_VERB columns in the input. But they are not part of group by clause. How to retrieve those columns as well. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-03-2018
	
		
		08:45 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi, Below is the input schema and output schema.  i/p: row_id,ODS_WII_VERB,stg_load_ts,other_columns   o/p: get the max timestamp  group by row_id and ODS_WII_VERB  issue: As we use only row_id and ODS_WII_VERB in the group by clause we are unable to get the other columns. How to get other columns as well. We tried creating a spark sql subquery but it seems spark sub query is not working in spark structured streaming.
How to resolve this issue.   code snippet   val csvDF = sparkSession
      .readStream
      .option("sep", ",")
      .schema(userSchema)
      .csv("C:\\Users\\M1037319\\Desktop\\data")       val updatedDf = csvDF.withColumn("ODS_WII_VERB", regexp_replace(col("ODS_WII_VERB"), "I", "U"))
    updatedDf.printSchema()       val grpbyDF = updatedDf.groupBy("ROW_ID","ODS_WII_VERB").max("STG_LOAD_TS")  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
