Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎08-05-2018
Accepted Solution

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

Hello community, 

 

My first post here, so please let me know if I'm not following protocol.

 

I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error:

 

AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

 

Can someone take a look at the code and let me know where I'm going wrong:

 

 

#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession

def main():

  spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('sales_info')

example8 = spark.sql("""SELECT
    *
FROM sales_info
ORDER BY Sales DESC""")
example8.saveAsTextFile("juyfd")

main()

 Any help would be appreciated

 

carlton

New Contributor
Posts: 2
Registered: ‎08-05-2018

Re: Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows:

 

 
#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('Person_Person')
myresults = spark.sql("""SELECT
  PersonType
 ,COUNT(PersonType) AS `Person Count`
FROM Person_Person
GROUP BY PersonType""")
myresults.collect()
result = myresults.collect()
result
result.saveAsTextFile("test")
However, I'm now getting the following error message:
  AttributeError: 'list' object has no attribute 'saveAsTextFile'
 
I think this could be an easier situation to help resolve.
 
So, if someone could help resolve this issue that would be most appreciated
 
Thanks
 
Cloudera Employee
Posts: 53
Registered: ‎03-01-2016

Re: Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.

 

result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:

 

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD

Announcements