Support Questions

Find answers, ask questions, and share your expertise

How to display pivoted dataframe with PSark, Pyspark?

avatar
Contributor

Cannoted display/show/print pivoted dataframe in with PySpark. Although apparently created pivoted dataframe fine, when try to show says AttributeError: 'GroupedData' object has no attribute 'show'.

Here's the code

meterdata = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", ",").option("header", "false").load("/CBIES/meters/") metercols = meterdata.groupBy("C0").pivot("C1")

metercols.show()

Output:

Traceback (most recent call last): File "/tmp/zeppelin_pyspark-8003809301447367155.py", line 239, in <module> eval(compiledCode) File "<string>", line 1, in <module> AttributeError: 'GroupedData' object has no attribute 'show'

1 REPLY 1

avatar
Expert Contributor

After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset.

After aggregation you'll be able to show() the data.

You can find an excellent overview of pivoting at this website:

https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html