Support Questions

Find answers, ask questions, and share your expertise

How to overwrite a file with pyspark

avatar
Explorer

Hello comnunity,

I'm using to following script to output the results of sparkql query to a file in Azure Data Store. However, instead creating a file called myresults.json and publishing the results to the myresults.json file, the script publishes the results to a random file name like part-0000-tid ... see image.

91748-pyspark.png

Can someone let me know how to make sure the file is created and overwritten each time the pyspark query is run?

Thanks

3 REPLIES 3

avatar
Explorer

Sorry, I for to add the query,

example1 = spark.sql("""SELECT
  CF.CountryName AS CountryCarsSold
 ,COUNT(CF.CountryName) AS NumberCountry
 ,MAX(CB.SalesDetailsID) AS TotalSold
FROM Data_SalesDetails CB
INNER JOIN Data_Sales CD
  ON CB.SalesID = CD.SalesID
INNER JOIN Data_Customer CG
  ON CD.CustomerID = CG.CustomerID
INNER JOIN Data_Country CF
  ON CG.Country = CF.CountryISO2
GROUP BY CF.CountryName""")
example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json")

avatar
Explorer

Sorry guys, I forgot to add the code:

example1 = spark.sql("""SELECT
  CF.CountryName AS CountryCarsSold
 ,COUNT(CF.CountryName) AS NumberCountry
 ,MAX(CB.SalesDetailsID) AS TotalSold
FROM Data_SalesDetails CB
INNER JOIN Data_Sales CD
  ON CB.SalesID = CD.SalesID
INNER JOIN Data_Customer CG
  ON CD.CustomerID = CG.CustomerID
INNER JOIN Data_Country CF
  ON CG.Country = CF.CountryISO2
GROUP BY CF.CountryName""")
example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json")

avatar
Explorer

Hi guys, I'm sorry if the question seems a little confusing. Basically, I would just like to be able to save to a single file and the file to be overwritten each time it is saved.

Thanks