Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to display pivoted dataframe with PSark, Pyspark?

How to display pivoted dataframe with PSark, Pyspark?

Explorer

Cannoted display/show/print pivoted dataframe in with PySpark. Although apparently created pivoted dataframe fine, when try to show says AttributeError: 'GroupedData' object has no attribute 'show'.

Here's the code

meterdata = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", ",").option("header", "false").load("/CBIES/meters/") metercols = meterdata.groupBy("C0").pivot("C1")

metercols.show()

Output:

Traceback (most recent call last): File "/tmp/zeppelin_pyspark-8003809301447367155.py", line 239, in <module> eval(compiledCode) File "<string>", line 1, in <module> AttributeError: 'GroupedData' object has no attribute 'show'

1 REPLY 1
Highlighted

Re: How to display pivoted dataframe with PSark, Pyspark?

Contributor

After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset.

After aggregation you'll be able to show() the data.

You can find an excellent overview of pivoting at this website:

https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html

Don't have an account?
Coming from Hortonworks? Activate your account here