Support Questions
Find answers, ask questions, and share your expertise

Fetch AVG using Dataframe in Spark

Fetch AVG using Dataframe in Spark

Contributor

Hi all,

I want to find the average of the 'rate' column using scala code in Spark. For that, I have created Dataframe and view then use Spark SQL for queries. When I run a select query using view it gives proper output But when I perform avg and group by using view then it gives no records.

data.txt is a tab-separated file.

 

abandon     -2
abandoned -2
abandons -2

 

val AFINN = sc.textFile("hdfs://sandbox-hdp.hortonworks.com:8020/Input/data.txt").map(x=> x.split("\t")).map(x=>(x(0).toString,x(1).toInt))
//AFINN: org.apache.spark.rdd.RDD[(String, Int)]

val AFINNDF = AFINN.toDF("word","rate")
//AFINNDF: org.apache.spark.sql.DataFrame = [word: string, rate: int] AFINNDF.createOrReplaceTempView("temp")

val DF = spark.sql("select word,rate from temp")
//DF: org.apache.spark.sql.DataFrame = [word: string, rate: int]DF.show()

Output:

+----------+----+
|      word|rate|
+----------+----+
|   abandon|  -2|
| abandoned|  -2|
|  abandons|  -2|
+----------+----+
val DF = spark.sql("select word,avg(rate) as rating from temp group by word")
//DF: org.apache.spark.sql.DataFrame = [word: string, rating: double]

Output:

+----+------+
|word|rating|
+----+------+
+----+------+

How to find avg using Spark SQL queries in scala?

Thanks,

Jay.

Don't have an account?