08-05-2018 02:41 AM
My first post here, so please let me know if I'm not following protocol.
I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error:
AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
Can someone take a look at the code and let me know where I'm going wrong:
#%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main()
Any help would be appreciated
08-05-2018 05:15 PM
ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows:
#%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test")
08-14-2018 01:47 AM
As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method.
result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame or RDD api: