Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Fetch AVG using Dataframe in Spark

Fetch AVG using Dataframe in Spark

Contributor

Hi all,

I want to find the average of the 'rate' column using scala code in Spark. For that, I have created Dataframe and view then use Spark SQL for queries. When I run a select query using view it gives proper output But when I perform avg and group by using view then it gives no records.

data.txt is a tab-separated file.

 

abandon     -2
abandoned -2
abandons -2

 

val AFINN = sc.textFile("hdfs://sandbox-hdp.hortonworks.com:8020/Input/data.txt").map(x=> x.split("\t")).map(x=>(x(0).toString,x(1).toInt))
//AFINN: org.apache.spark.rdd.RDD[(String, Int)]

val AFINNDF = AFINN.toDF("word","rate")
//AFINNDF: org.apache.spark.sql.DataFrame = [word: string, rate: int] AFINNDF.createOrReplaceTempView("temp")

val DF = spark.sql("select word,rate from temp")
//DF: org.apache.spark.sql.DataFrame = [word: string, rate: int]DF.show()

Output:

+----------+----+
|      word|rate|
+----------+----+
|   abandon|  -2|
| abandoned|  -2|
|  abandons|  -2|
+----------+----+
val DF = spark.sql("select word,avg(rate) as rating from temp group by word")
//DF: org.apache.spark.sql.DataFrame = [word: string, rating: double]

Output:

+----+------+
|word|rating|
+----+------+
+----+------+

How to find avg using Spark SQL queries in scala?

Thanks,

Jay.

Don't have an account?
Coming from Hortonworks? Activate your account here