Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to display pivoted dataframe with PSark, Pyspark?

avatar
Contributor

Cannoted display/show/print pivoted dataframe in with PySpark. Although apparently created pivoted dataframe fine, when try to show says AttributeError: 'GroupedData' object has no attribute 'show'.

Here's the code

meterdata = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", ",").option("header", "false").load("/CBIES/meters/") metercols = meterdata.groupBy("C0").pivot("C1")

metercols.show()

Output:

Traceback (most recent call last): File "/tmp/zeppelin_pyspark-8003809301447367155.py", line 239, in <module> eval(compiledCode) File "<string>", line 1, in <module> AttributeError: 'GroupedData' object has no attribute 'show'

1 REPLY 1

avatar
Expert Contributor

After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset.

After aggregation you'll be able to show() the data.

You can find an excellent overview of pivoting at this website:

https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html