- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to display pivoted dataframe with PSark, Pyspark?
- Labels:
-
Apache Spark
Created ‎01-27-2017 01:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cannoted display/show/print pivoted dataframe in with PySpark. Although apparently created pivoted dataframe fine, when try to show says AttributeError: 'GroupedData' object has no attribute 'show'.
Here's the code
meterdata = sqlContext.read.format("com.databricks.spark.csv").option("delimiter", ",").option("header", "false").load("/CBIES/meters/") metercols = meterdata.groupBy("C0").pivot("C1")
metercols.show()
Output:
Traceback (most recent call last): File "/tmp/zeppelin_pyspark-8003809301447367155.py", line 239, in <module> eval(compiledCode) File "<string>", line 1, in <module> AttributeError: 'GroupedData' object has no attribute 'show'
Created ‎01-27-2017 08:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset.
After aggregation you'll be able to show() the data.
You can find an excellent overview of pivoting at this website:
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
