Support Questions
Find answers, ask questions, and share your expertise

Hi i am new to spark i need to find out the total weight and total hours by using subscriber id.please make use spark core and spark sql

New Contributor

I have a file in that data like Subscriber Id,Weight,Hours

Subscriber Id,Weight,Hours

123,10,7,

456,15,10

123,20,15

I want the output like

123,30,22

456,15,10

thanks and regards

1 REPLY 1

Super Guru
@Sura Reddy

We need to use Groupby on subscriberid and sum weight,Hours to get your desired result.

Example:

Using DataFrameAPI:

Create dataframe as shown below

>>> df=sqlContext.createDataFrame([(123,10,7),(456,15,10),(123,20,15)],["SubscriberId","Weight","Hours"])

Check the data in the dataframe.

>>> df.show(10,False)
+------------+------+-----+
|SubscriberId|Weight|Hours|
+------------+------+-----+
|123         |10    |7    |
|456         |15    |10   |
|123         |20    |15   |
+------------+------+-----+

GroupBy and sum on the columns and select the columns

>>> df.groupBy("SubscriberId").agg(sum(col("Weight")).alias("Weight"),sum(col("Hours")).alias("Hours")).select("SubscriberId","Weight","Hours").show(10,False)
+------------+------+-----+
|SubscriberId|Weight|Hours|
+------------+------+-----+
|123         |30    |22   |
|456         |15    |10   |
+------------+------+-----+

(or)

By using Spark-SQL:

1.Create temporary table

>>> df.registerTempTable("temp")

2.Check the data in the table

>>> spark.sql("select * from temp").show(10,False)
+------------+------+-----+
|SubscriberId|Weight|Hours|
+------------+------+-----+
|123         |10    |7    |
|456         |15    |10   |
|123         |20    |15   |
+------------+------+-----+

3.Do groupby,sum using sql syntax

>>> spark.sql("select SubscriberId,sum(Weight) Weight,sum(Hours) Hours from temp group by SubscriberId").show(10,False)
+------------+------+-----+
|SubscriberId|Weight|Hours|
+------------+------+-----+
|123         |30    |22   |
|456         |15    |10   |
+------------+------+-----+