Created on
11-27-2019
06:50 AM
- last edited on
11-27-2019
09:29 AM
by
ask_bill_brooks
Hi all,
I want to find the average of the 'rate' column using scala code in Spark. For that, I have created Dataframe and view then use Spark SQL for queries. When I run a select query using view it gives proper output But when I perform avg and group by using view then it gives no records.
data.txt is a tab-separated file.
abandon -2
abandoned -2
abandons -2
val AFINN = sc.textFile("hdfs://sandbox-hdp.hortonworks.com:8020/Input/data.txt").map(x=> x.split("\t")).map(x=>(x(0).toString,x(1).toInt)) //AFINN: org.apache.spark.rdd.RDD[(String, Int)] val AFINNDF = AFINN.toDF("word","rate") //AFINNDF: org.apache.spark.sql.DataFrame = [word: string, rate: int] AFINNDF.createOrReplaceTempView("temp") val DF = spark.sql("select word,rate from temp") //DF: org.apache.spark.sql.DataFrame = [word: string, rate: int]DF.show()
Output:
+----------+----+ | word|rate| +----------+----+ | abandon| -2| | abandoned| -2| | abandons| -2| +----------+----+
val DF = spark.sql("select word,avg(rate) as rating from temp group by word") //DF: org.apache.spark.sql.DataFrame = [word: string, rating: double]
Output:
+----+------+ |word|rating| +----+------+ +----+------+
How to find avg using Spark SQL queries in scala?
Thanks,
Jay.