Created 08-13-2018 09:42 AM
Hello community,
I have created the following pyspark query:
from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/HumanResources_vEmployeeDepartment.csv',inferSchema=True,header=True) df.createOrReplaceTempView('HumanResources_vEmployeeDepartment') counts = spark.sql("""SELECT FirstName ,LastName ,JobTitle FROM HumanResources_vEmployeeDepartment ORDER BY FirstName, LastName DESC""") counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3.csv")
I would like to add the current date and time to the file called myresults3.
I think the code would look something like the following:
counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/'myresults3'-CURRENTDATE.csv")
I'm sure I'm way off the mark with the above attempt, but I'm sure you can see what I'm trying to achieve.
Any help will be appreciated.
Cheers
Carlton
Created 08-13-2018 01:43 PM
You can use "mode("append")" to append the new data to existing one.
counts.coalesce(1).write.mode("append").csv("/home/packt/Downloads/myresults7-"+currentdate+".csv")
P.S please use 'reply' on this comment instead of writing a new comment. In this way we can maintain the conversaion in order.
Created 08-13-2018 10:44 AM
@Carlton Patterson You can use the python's datetime package to obtain the current date.
import datetime
currentdate = datetime.datetime.now().strftime("%Y-%m-%d")
print currentdate
>>> 2018-08-13
And then use the currentdate in output file name.
counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3-" + currentdate + ".csv")
Hope this helps.
P.S. If you want date and time use: datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
Created 08-13-2018 10:45 AM
Looks like there are 3 questions with same description...
Created 08-13-2018 11:04 AM
Sandeep,
Thanks for reaching out.
I'm getting the following error from the import function
Append ResultsClear Results File "<ipython-input-7-3dab170099f6>", line 3 import datetime currentdate = datetime.datetime.now().strftime("%Y-%m-%d") ^SyntaxError: invalid syntax
Created 08-13-2018 11:11 AM
Looks like some issue with text formatting. Try this:
import datetime
currentdate = datetime.datetime.now().strftime("%Y-%m-%d")
print currentdate
>>> 2018-08-13
Created 08-13-2018 11:05 AM
The syntax error is with 'currentdate'
Created 08-13-2018 11:20 AM
I now get the following error:
File "<ipython-input-13-588f4561c3f0>", line 7 print currentdate() ^SyntaxError: invalid syntax
The invalid syntax is currentdate()
Without the parentheses I get the following error:
File "<ipython-input-14-8d268659919b>", line 1 print currentdate ^SyntaxError: Missing parentheses in call to 'print'
Created 08-13-2018 11:26 AM
What is the python version you are using? if its python 3.x
Use: print(currentdate)
Created 08-13-2018 12:08 PM
I'm using python version 3 and print(currentate) worked. Thanks. However, when I run the full query I get the following error:
ipython-input-22-8c743396e037> in <module>() 18FROMHumanResources_vEmployeeDepartment 19 ORDER BY FirstName, LastName DESC""") ---> 20counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/myresults7-"+currentdate+".csv") 'DataFrameWriter' object has no attribute 'csvCONCAT'
Created 08-13-2018 12:19 PM
Hi Sandeep,
I should be clear about what I'm trying to achieve.
I would like the output to include only the delta change. I thought that having the current date would be sufficient, but I just realized that having just the currentdate won't let me know if there has been a change to the data.
Therefore, while your helping me could you also help me figure out how to include the currentdate and the delta change in data?
Much appreciated.
Cheers