Support Questions

manikandanjeyab · ‎08-26-2018

Hi All,

Im trying to add a column to a dataframe based on multiple check condition, one of the operation that we are doing is we need to take sum of rows, but im getting Below error:

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset [Column_12: double] at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at scala.util.Try.getOrElse(Try.scala:79) at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162) at org.apache.spark.sql.functions$.typedLit(functions.scala:112) at org.apache.spark.sql.functions$.lit(functions.scala:95) at MYDev.ReconTest$.main(ReconTest.scala:35) at MYDev.ReconTest.main(ReconTest.scala)

and the Query im using is:

var df = inputDf
df = df.persist()
inputDf = inputDf.withColumn("newColumn",
  when(df("MinBusinessDate") < "2018-08-8" && df("MaxBusinessDate") > "2018-08-08",
    lit(df.groupBy(df("tableName"),df("runDate"))
      .agg(sum(when(df("business_date") > "2018-08-08", df("rowCount")))
        .alias("finalSRCcount"))
      .drop("tableName","runDate"))))

Cheers,

MJ

manikandanjeyab · ‎08-27-2018

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
.agg(sum(when(col("MinBusinessDate")< col("runDate")&& col("MaxBusinessDate")> col("runDate"),
when(col("business_date")> col("runDate"), col("rowCount")))).alias("NewColumnName"))

View solution in original post

manikandanjeyab · ‎08-27-2018

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
.agg(sum(when(col("MinBusinessDate")< col("runDate")&& col("MaxBusinessDate")> col("runDate"),
when(col("business_date")> col("runDate"), col("rowCount")))).alias("NewColumnName"))

Cloudera Community

Support Questions

spark scala Dataframe adding new column Error

Column names not getting created for Spark DataFra...

Introduction to CDE Scala Jobs

rename columns of the dataframe

TimestampType format for Spark DataFrames

Receiving AVRO Messages through KAFKA in a Spark S...

Not able to split the column into multiple columns...

Save Spark DataFrame table into Phoenix

spark + scala schema creattion error

Spark Scala - Remove rows that have columns with s...

Spark DataFrame to Solr Cloud - runs on Sandbox 2....