Support Questions

manikandanjeyab · ‎08-26-2018

Hi All,

I'm trying to add a column to a dataframe based on multiple check condition, one of the operation that we are doing is we need to take sum of rows, but im getting Below error:

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class org.apache.spark.sql.Dataset [StorageDayCountBeore: double] at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at org.apache.spark.sql.catalyst.expressions.Literal$anonfun$create$2.apply(literals.scala:163) at scala.util.Try.getOrElse(Try.scala:79) at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:162) at org.apache.spark.sql.functions$.typedLit(functions.scala:112) at org.apache.spark.sql.functions$.lit(functions.scala:95) at MYDev.ReconTest$.main(ReconTest.scala:35) at MYDev.ReconTest.main(ReconTest.scala)

and the Query im using is:

var df = inputDf
df = df.persist()
inputDf = inputDf.withColumn("newColumn",
  when(df("MinBusinessDate") < "2018-08-8" && df("MaxBusinessDate") > "2018-08-08",
    lit(df.groupBy(df("tableName"),df("runDate"))
      .agg(sum(when(df("business_date") > "2018-08-08", df("rowCount")))
        .alias("finalSRCcount"))
      .drop("tableName","runDate"))))

Cheers,

MJ

manikandanjeyab · ‎08-27-2018

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
  .agg(sum(when(col("MinBusinessDate") < col("runDate") && col("MaxBusinessDate") > col("runDate"),
    when(col("business_date") > col("runDate"), col("rowCount")))).alias("NewColumnName"))

View solution in original post

manikandanjeyab · ‎08-27-2018

Hi Issue got resolved,

i'm trying to perform Group by operation inside a Columns literal, group by itself will produce a new columns instead writing a query like i asked above we have to change our query accordingly as follow.

inputDf = inputDf.groupBy(col("tableName"),col("runDate"))
  .agg(sum(when(col("MinBusinessDate") < col("runDate") && col("MaxBusinessDate") > col("runDate"),
    when(col("business_date") > col("runDate"), col("rowCount")))).alias("NewColumnName"))

Cloudera Community

Support Questions

Unsupported literal type class in Apache Spark in scala

HDP 2.6.4 - HDF 3.1: Apache Kafka - Apache Spark S...

Receiving AVRO Messages through KAFKA in a Spark S...

Parsing Apache Log Files with Spark

Integrating Apache Spark 2.x Jobs with Apache NiFi...

Apache Zeppelin (Hive & Spark Demo)

HDF 3.1: Executing Apache Spark via ExecuteSparkIn...

Apache Spark - Apache HBase Connector

How to install Apache Zeppelin, R, Solr, and Girap...

Horses for Courses: Apache Spark Streaming and Apa...

Unsupported Datatype