Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

avatar
Explorer

Hello community, 

 

My first post here, so please let me know if I'm not following protocol.

 

I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error:

 

AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

 

Can someone take a look at the code and let me know where I'm going wrong:

 

 

#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession

def main():

  spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('sales_info')

example8 = spark.sql("""SELECT
    *
FROM sales_info
ORDER BY Sales DESC""")
example8.saveAsTextFile("juyfd")

main()

 Any help would be appreciated

 

carlton

1 ACCEPTED SOLUTION

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Explorer

ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows:

 

 
#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('Person_Person')
myresults = spark.sql("""SELECT
  PersonType
 ,COUNT(PersonType) AS `Person Count`
FROM Person_Person
GROUP BY PersonType""")
myresults.collect()
result = myresults.collect()
result
result.saveAsTextFile("test")
However, I'm now getting the following error message:
  AttributeError: 'list' object has no attribute 'saveAsTextFile'
 
I think this could be an easier situation to help resolve.
 
So, if someone could help resolve this issue that would be most appreciated
 
Thanks
 

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions.

Using DataFrame writer:

df.write.format("text").save("path_to_output_directory")

Converting to RDD and then using saveAsTextFile

rdd = df.rdd.map(lambda row: str(row))
rdd.saveAsTextFile("path_to_output_directory")