About elango_rk

elango_rk · ‎08-28-2018

@Felix Albani Thank you.

elango_rk · ‎08-28-2018

Hi, I am joining two tables. One table is skewed. How to handle this in spark SQL. I am using spark 2.2.1 in AWS EMR. Please assist on this.

elango_rk · ‎02-04-2018

@Shu Thanks a lot for the answer. In my case the non group by columns are string data types. Can I use non group by columns that are string data types in the aggregation function? Can I create temp view on the data frame and then use subquery to retrieve the results? Is this possible in structured streaming?

elango_rk · ‎02-04-2018

Hi Shu, @shu I have few other columns apart from the ROW_ID,ODS_WII_VERB columns in the input. But they are not part of group by clause. How to retrieve those columns as well.

elango_rk · ‎02-03-2018

Hi, Below is the input schema and output schema. i/p: row_id,ODS_WII_VERB,stg_load_ts,other_columns o/p: get the max timestamp group by row_id and ODS_WII_VERB issue: As we use only row_id and ODS_WII_VERB in the group by clause we are unable to get the other columns. How to get other columns as well. We tried creating a spark sql subquery but it seems spark sub query is not working in spark structured streaming. How to resolve this issue. code snippet val csvDF = sparkSession .readStream .option("sep", ",") .schema(userSchema) .csv("C:\\Users\\M1037319\\Desktop\\data") val updatedDf = csvDF.withColumn("ODS_WII_VERB", regexp_replace(col("ODS_WII_VERB"), "I", "U")) updatedDf.printSchema() val grpbyDF = updatedDf.groupBy("ROW_ID","ODS_WII_VERB").max("STG_LOAD_TS")

Online	Offline
Last Visited	‎08-28-2018 12:58 PM

Member Since	‎07-29-2017 07:27 AM
Last Visited	‎08-28-2018 12:58 PM
Posts	10

Cloudera Community

Re: Skew issue in spark sql

Skew issue in spark sql

Re: How to get the non group by columns in spark s...

Re: How to get the non group by columns in spark s...

How to get the non group by columns in spark struc...