Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark scala Dataframe adding new column Error

Solved Go to solution

spark scala Dataframe adding new column Error

Contributor

Hi All,

Im trying to add a column to a dataframe based on multiple check condition, one of the operation that we are doing is we need to take sum of rows, but im getting Below error:

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset [Column_12: double] at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at scala.util.Try.getOrElse(Try.scala:79) at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162) at org.apache.spark.sql.functions$.typedLit(functions.scala:112) at org.apache.spark.sql.functions$.lit(functions.scala:95) at MYDev.ReconTest$.main(ReconTest.scala:35) at MYDev.ReconTest.main(ReconTest.scala)

and the Query im using is:

var df = inputDf
df = df.persist()
inputDf = inputDf.withColumn("newColumn",
when(df("MinBusinessDate") < "2018-08-8" && df("MaxBusinessDate") > "2018-08-08",
lit(df.groupBy(df("tableName"),df("runDate"))
.agg(sum(when(df("business_date") > "2018-08-08", df("rowCount")))
.alias("finalSRCcount"))
.drop("tableName","runDate"))))

Cheers,

MJ

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: spark scala Dataframe adding new column Error

Contributor

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

  1. inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
  2. .agg(sum(when(col("MinBusinessDate")< col("runDate")&& col("MaxBusinessDate")> col("runDate"),
  3. when(col("business_date")> col("runDate"), col("rowCount")))).alias("NewColumnName"))

View solution in original post

1 REPLY 1
Highlighted

Re: spark scala Dataframe adding new column Error

Contributor

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

  1. inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
  2. .agg(sum(when(col("MinBusinessDate")< col("runDate")&& col("MaxBusinessDate")> col("runDate"),
  3. when(col("business_date")> col("runDate"), col("rowCount")))).alias("NewColumnName"))

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here