Support Questions
Find answers, ask questions, and share your expertise

Fetch AVG using Dataframe in Spark

Fetch AVG using Dataframe in Spark


Hi all,

I want to find the average of the 'rate' column using scala code in Spark. For that, I have created Dataframe and view then use Spark SQL for queries. When I run a select query using view it gives proper output But when I perform avg and group by using view then it gives no records.

data.txt is a tab-separated file.


abandon     -2
abandoned -2
abandons -2


val AFINN = sc.textFile("hdfs://").map(x=> x.split("\t")).map(x=>(x(0).toString,x(1).toInt))
//AFINN: org.apache.spark.rdd.RDD[(String, Int)]

val AFINNDF = AFINN.toDF("word","rate")
//AFINNDF: org.apache.spark.sql.DataFrame = [word: string, rate: int] AFINNDF.createOrReplaceTempView("temp")

val DF = spark.sql("select word,rate from temp")
//DF: org.apache.spark.sql.DataFrame = [word: string, rate: int]


|      word|rate|
|   abandon|  -2|
| abandoned|  -2|
|  abandons|  -2|
val DF = spark.sql("select word,avg(rate) as rating from temp group by word")
//DF: org.apache.spark.sql.DataFrame = [word: string, rating: double]



How to find avg using Spark SQL queries in scala?



Don't have an account?