Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Unsupported literal type class in Apache Spark in scala

avatar

Hi All,

I'm trying to add a column to a dataframe based on multiple check condition, one of the operation that we are doing is we need to take sum of rows, but im getting Below error:

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset [StorageDayCountBeore: double] at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at scala.util.Try.getOrElse(Try.scala:79) at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162) at org.apache.spark.sql.functions$.typedLit(functions.scala:112) at org.apache.spark.sql.functions$.lit(functions.scala:95) at MYDev.ReconTest$.main(ReconTest.scala:35) at MYDev.ReconTest.main(ReconTest.scala)

and the Query im using is:

var df = inputDf
df = df.persist()
inputDf = inputDf.withColumn("newColumn",
when(df("MinBusinessDate") < "2018-08-8" && df("MaxBusinessDate") > "2018-08-08",
lit(df.groupBy(df("tableName"),df("runDate"))
.agg(sum(when(df("business_date") > "2018-08-08", df("rowCount")))
.alias("finalSRCcount"))
.drop("tableName","runDate"))))

Cheers,

MJ

1 ACCEPTED SOLUTION

avatar

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
.agg(sum(when(col("MinBusinessDate") < col("runDate") && col("MaxBusinessDate") > col("runDate"),
when(col("business_date") > col("runDate"), col("rowCount")))).alias("NewColumnName"))

View solution in original post

1 REPLY 1

avatar

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
.agg(sum(when(col("MinBusinessDate") < col("runDate") && col("MaxBusinessDate") > col("runDate"),
when(col("business_date") > col("runDate"), col("rowCount")))).alias("NewColumnName"))