Created on 10-18-2018 10:19 PM - edited 08-17-2019 07:16 PM
Hello comnunity,
I'm using to following script to output the results of sparkql query to a file in Azure Data Store. However, instead creating a file called myresults.json and publishing the results to the myresults.json file, the script publishes the results to a random file name like part-0000-tid ... see image.
Can someone let me know how to make sure the file is created and overwritten each time the pyspark query is run?
Thanks
Created 10-18-2018 10:45 PM
Sorry, I for to add the query,
example1 = spark.sql("""SELECT CF.CountryName AS CountryCarsSold ,COUNT(CF.CountryName) AS NumberCountry ,MAX(CB.SalesDetailsID) AS TotalSold FROM Data_SalesDetails CB INNER JOIN Data_Sales CD ON CB.SalesID = CD.SalesID INNER JOIN Data_Customer CG ON CD.CustomerID = CG.CustomerID INNER JOIN Data_Country CF ON CG.Country = CF.CountryISO2 GROUP BY CF.CountryName""") example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json")
Created 10-19-2018 09:49 AM
Sorry guys, I forgot to add the code:
example1 = spark.sql("""SELECT CF.CountryName AS CountryCarsSold ,COUNT(CF.CountryName) AS NumberCountry ,MAX(CB.SalesDetailsID) AS TotalSold FROM Data_SalesDetails CB INNER JOIN Data_Sales CD ON CB.SalesID = CD.SalesID INNER JOIN Data_Customer CG ON CD.CustomerID = CG.CustomerID INNER JOIN Data_Country CF ON CG.Country = CF.CountryISO2 GROUP BY CF.CountryName""") example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json")
Created 10-19-2018 10:13 AM
Hi guys, I'm sorry if the question seems a little confusing. Basically, I would just like to be able to save to a single file and the file to be overwritten each time it is saved.
Thanks