Support Questions

barlow · ‎08-05-2018

Hello community,

My first post here, so please let me know if I'm not following protocol.

I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error:

AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

Can someone take a look at the code and let me know where I'm going wrong:

#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession

def main():

  spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('sales_info')

example8 = spark.sql("""SELECT
    *
FROM sales_info
ORDER BY Sales DESC""")
example8.saveAsTextFile("juyfd")

main()

Any help would be appreciated

carlton

Yuexin Zhang · ‎08-14-2018

As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.

result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD

View solution in original post

barlow · ‎08-05-2018

ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows:

#%%
import findspark
findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7')
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('Person_Person')
myresults = spark.sql("""SELECT
  PersonType
 ,COUNT(PersonType) AS `Person Count`
FROM Person_Person
GROUP BY PersonType""")
myresults.collect()
result = myresults.collect()
result
result.saveAsTextFile("test")

However, I'm now getting the following error message:

AttributeError: 'list' object has no attribute 'saveAsTextFile'

I think this could be an easier situation to help resolve.

So, if someone could help resolve this issue that would be most appreciated

Thanks

Yuexin Zhang · ‎08-14-2018

As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.

result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD

krunal_lathiya · ‎01-02-2024

To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions.

Using DataFrame writer:

df.write.format("text").save("path_to_output_directory")

Converting to RDD and then using saveAsTextFile

rdd = df.rdd.map(lambda row: str(row))
rdd.saveAsTextFile("path_to_output_directory")

Cloudera Community

Support Questions

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

Issue when using PySpark with Impala via JDBC

Pyspark dataframe: How to replace

pyspark get row value from row object

How to transpose a pyspark dataframe?

Pyspark: Table Dataframe returning empty records f...

Using VirtualEnv with PySpark

(Zeppelin) pyspark read hive TypeError: 'JavaPacka...

PySpark: How to add column to dataframe with calcu...

How to replace blank rows in pyspark Dataframe?

Distributed XGBoost with PySpark in Cloudera Machi...