Created 02-09-2017 05:06 PM
Iam looking to perform spark pivot without aggregation, is it really possible to use the spark 2.1.0 api and generate a pivot without aggregation. I tried but as the pivot returns groupeddataset without aggregation the api is not working for me. Any ideas how to convert to DF or dataset without performing aggregation and show the DF please. Thank you
Created 02-09-2017 08:28 PM
No, it is not possible:
"A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns"
Source: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
Created 02-09-2017 09:03 PM
Hi Tibor thanks for your reply. I have looked at the above link. I have a dataset of structure i.e. .gz file which iam reading in spark
abcdefghij; abc=1234 xyz=987 abn=567 ubg=345
after pivot
abcdefghij | abn | ubg | abc | xyz |
abcdefghij | 567 | 987 | 1234 | 987 |
and so on.
All the above columns are string columns and abn values are duplicated. So as they are strings and iam just looking to split the data thats why i dont need aggregation, i just need pivot. Any ideas to acheive this in spark scala? thank u
Created 01-22-2019 04:27 PM
This is old question, but just thought of replying
you can do df.groupBY().pivot("pivotcolname).agg(...)
Notice that groypBy clause is empty