Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Performing Spark pivot without aggregation?

avatar
Expert Contributor

Iam looking to perform spark pivot without aggregation, is it really possible to use the spark 2.1.0 api and generate a pivot without aggregation. I tried but as the pivot returns groupeddataset without aggregation the api is not working for me. Any ideas how to convert to DF or dataset without performing aggregation and show the DF please. Thank you

3 REPLIES 3

avatar
Expert Contributor

No, it is not possible:

"A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns"

Source: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html

avatar
Expert Contributor

Hi Tibor thanks for your reply. I have looked at the above link. I have a dataset of structure i.e. .gz file which iam reading in spark

abcdefghij; abc=1234 xyz=987 abn=567 ubg=345

after pivot

abcdefghijabnubgabcxyz
abcdefghij5679871234987

and so on.

All the above columns are string columns and abn values are duplicated. So as they are strings and iam just looking to split the data thats why i dont need aggregation, i just need pivot. Any ideas to acheive this in spark scala? thank u

avatar
Contributor

This is old question, but just thought of replying

you can do df.groupBY().pivot("pivotcolname).agg(...)

Notice that groypBy clause is empty