Support Questions

bigspark · ‎02-09-2017

Iam looking to perform spark pivot without aggregation, is it really possible to use the spark 2.1.0 api and generate a pivot without aggregation. I tried but as the pivot returns groupeddataset without aggregation the api is not working for me. Any ideas how to convert to DF or dataset without performing aggregation and show the DF please. Thank you

tkiss · ‎02-09-2017

No, it is not possible:

"A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns"

Source: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html

bigspark · ‎02-09-2017

Hi Tibor thanks for your reply. I have looked at the above link. I have a dataset of structure i.e. .gz file which iam reading in spark

abcdefghij; abc=1234 xyz=987 abn=567 ubg=345

after pivot

abcdefghij	abn	ubg	abc	xyz
abcdefghij	567	987	1234	987

and so on.

All the above columns are string columns and abn values are duplicated. So as they are strings and iam just looking to split the data thats why i dont need aggregation, i just need pivot. Any ideas to acheive this in spark scala? thank u

devquestions2 · ‎01-22-2019

This is old question, but just thought of replying

you can do df.groupBY().pivot("pivotcolname).agg(...)

Notice that groypBy clause is empty

Cloudera Community

Support Questions

Performing Spark pivot without aggregation?

Tips and best practices for optimizing Hive perfor...

Spark 3 legacy configurations list ( Spark 2 behav...

Spark Python Supportability Matrix

Monitoring Spark 2 performance via Grafana in Amba...

Deep dive into YARN Log Aggregation / Deep dive in...

Spark and Java versions Supportability Matrix

SQOOP Performance tuning

Spark Scala Version Compatibility Matrix

Testing Spark write performance with Spark version...

Spark Load/Performance Testing using Gatling – PAR...