Member since
08-05-2018
73
Posts
0
Kudos Received
0
Solutions
10-21-2018
06:28 AM
Hi Shu, thanks for responding. The solution you provided appears a little difficult for something that I thought would be relatively simple. I will try your solution and let you know how I get on. In the meantime, have you seen the solution provided here: https://forums.databricks.com/questions/2848/how-do-i-create-a-single-csv-file-from-multiple-pa.html?childToView=12091
... View more
10-20-2018
02:46 PM
Hello Community, I trying to create a single file from an output query that is overwritten each time query is run. However, I keep on getting multiple part-00001 files. I have tried the following codes. They appear to overwrite the file, but a different filename is generate each time. example1.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput4/newresults") example1.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput4/newresults/theresults.csv")
carl = example1.show() example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json") example1.repartition(1).write.format("csv").mode("overwrite").save("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/thefile.csv") Can someone show me how write code that will result in a single file that is overwritten without changing the filename?
... View more
10-19-2018
10:13 AM
Hi guys, I'm sorry if the question seems a little confusing. Basically, I would just like to be able to save to a single file and the file to be overwritten each time it is saved. Thanks
... View more
10-19-2018
09:49 AM
Sorry guys, I forgot to add the code: example1 = spark.sql("""SELECT
CF.CountryName AS CountryCarsSold
,COUNT(CF.CountryName) AS NumberCountry
,MAX(CB.SalesDetailsID) AS TotalSold
FROM Data_SalesDetails CB
INNER JOIN Data_Sales CD
ON CB.SalesID = CD.SalesID
INNER JOIN Data_Customer CG
ON CD.CustomerID = CG.CustomerID
INNER JOIN Data_Country CF
ON CG.Country = CF.CountryISO2
GROUP BY CF.CountryName""")
example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json")
... View more
10-18-2018
10:45 PM
Sorry, I for to add the query, example1 = spark.sql("""SELECT
CF.CountryName AS CountryCarsSold
,COUNT(CF.CountryName) AS NumberCountry
,MAX(CB.SalesDetailsID) AS TotalSold
FROM Data_SalesDetails CB
INNER JOIN Data_Sales CD
ON CB.SalesID = CD.SalesID
INNER JOIN Data_Customer CG
ON CD.CustomerID = CG.CustomerID
INNER JOIN Data_Country CF
ON CG.Country = CF.CountryISO2
GROUP BY CF.CountryName""")
example1.coalesce(1).write.mode("append").json("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/myoutput/myresults.json")
... View more
10-18-2018
10:19 PM
Hello comnunity, I'm using to following script to output the results of sparkql query to a file in Azure Data Store. However, instead creating a file called myresults.json and publishing the results to the myresults.json file, the script publishes the results to a random file name like part-0000-tid ... see image. Can someone let me know how to make sure the file is created and overwritten each time the pyspark query is run? Thanks
... View more
08-13-2018
04:11 PM
Hi Sandeep, thanks. It works very well. Thank you
... View more
08-13-2018
12:19 PM
Hi Sandeep, I should be clear about what I'm trying to achieve. I would like the output to include only the delta change. I thought that having the current date would be sufficient, but I just realized that having just the currentdate won't let me know if there has been a change to the data. Therefore, while your helping me could you also help me figure out how to include the currentdate and the delta change in data? Much appreciated. Cheers
... View more
08-13-2018
12:08 PM
I'm using python version 3 and print(currentate) worked. Thanks. However, when I run the full query I get the following error: ipython-input-22-8c743396e037> in <module>() 18FROMHumanResources_vEmployeeDepartment 19 ORDER BY FirstName, LastName DESC""")
---> 20counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/myresults7-"+currentdate+".csv")
'DataFrameWriter' object has no attribute 'csvCONCAT'
... View more
08-13-2018
11:20 AM
I now get the following error: File "<ipython-input-13-588f4561c3f0>", line 7 print currentdate() ^SyntaxError: invalid syntax The invalid syntax is currentdate() Without the parentheses I get the following error: File "<ipython-input-14-8d268659919b>", line 1 print currentdate ^SyntaxError: Missing parentheses in call to 'print'
... View more