- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Performing Spark pivot without aggregation?
- Labels:
-
Apache Spark
Created ‎02-09-2017 05:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Iam looking to perform spark pivot without aggregation, is it really possible to use the spark 2.1.0 api and generate a pivot without aggregation. I tried but as the pivot returns groupeddataset without aggregation the api is not working for me. Any ideas how to convert to DF or dataset without performing aggregation and show the DF please. Thank you
Created ‎02-09-2017 08:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, it is not possible:
"A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns"
Source: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
Created ‎02-09-2017 09:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tibor thanks for your reply. I have looked at the above link. I have a dataset of structure i.e. .gz file which iam reading in spark
abcdefghij; abc=1234 xyz=987 abn=567 ubg=345
after pivot
abcdefghij | abn | ubg | abc | xyz |
abcdefghij | 567 | 987 | 1234 | 987 |
and so on.
All the above columns are string columns and abn values are duplicated. So as they are strings and iam just looking to split the data thats why i dont need aggregation, i just need pivot. Any ideas to acheive this in spark scala? thank u
Created ‎01-22-2019 04:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is old question, but just thought of replying
you can do df.groupBY().pivot("pivotcolname).agg(...)
Notice that groypBy clause is empty
