Support Questions

Find answers, ask questions, and share your expertise

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

avatar
Explorer

Hello community, 

 

My first post here, so please let me know if I'm not following protocol.

 

I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error:

 

AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

 

Can someone take a look at the code and let me know where I'm going wrong:

 

 

#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession

def main():

  spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('sales_info')

example8 = spark.sql("""SELECT
    *
FROM sales_info
ORDER BY Sales DESC""")
example8.saveAsTextFile("juyfd")

main()

 Any help would be appreciated

 

carlton

1 ACCEPTED SOLUTION

avatar
Expert Contributor

As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.

 

result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:

 

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD

View solution in original post

3 REPLIES 3

avatar
Explorer

ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows:

 

 
#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('Person_Person')
myresults = spark.sql("""SELECT
  PersonType
 ,COUNT(PersonType) AS `Person Count`
FROM Person_Person
GROUP BY PersonType""")
myresults.collect()
result = myresults.collect()
result
result.saveAsTextFile("test")
However, I'm now getting the following error message:
  AttributeError: 'list' object has no attribute 'saveAsTextFile'
 
I think this could be an easier situation to help resolve.
 
So, if someone could help resolve this issue that would be most appreciated
 
Thanks
 

avatar
Expert Contributor

As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.

 

result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:

 

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD

avatar

To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions.

Using DataFrame writer:

df.write.format("text").save("path_to_output_directory")

Converting to RDD and then using saveAsTextFile

rdd = df.rdd.map(lambda row: str(row))
rdd.saveAsTextFile("path_to_output_directory")