Created on 08-05-2018 02:41 AM - edited 09-16-2022 06:33 AM
Hello community,
My first post here, so please let me know if I'm not following protocol.
I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error:
AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
Can someone take a look at the code and let me know where I'm going wrong:
#%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main()
Any help would be appreciated
carlton
Created 08-14-2018 01:47 AM
As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.
result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD
Created 08-05-2018 05:15 PM
ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows:
#%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test")
Created 08-14-2018 01:47 AM
As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.
result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api:
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.rdd.RDD
Created 01-02-2024 03:09 AM
To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions.
Using DataFrame writer:
df.write.format("text").save("path_to_output_directory")
Converting to RDD and then using saveAsTextFile
rdd = df.rdd.map(lambda row: str(row))
rdd.saveAsTextFile("path_to_output_directory")