Support Questions

Find answers, ask questions, and share your expertise

How to concatenate a date to a filename in pyspark

avatar
Explorer

Hello community,

I have created the following pyspark query:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/HumanResources_vEmployeeDepartment.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('HumanResources_vEmployeeDepartment')
counts = spark.sql("""SELECT
FirstName
,LastName
,JobTitle
FROM HumanResources_vEmployeeDepartment
ORDER BY FirstName, LastName DESC""")
counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3.csv") 

I would like to add the current date and time to the file called myresults3.

I think the code would look something like the following:

counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/'myresults3'-CURRENTDATE.csv") 

I'm sure I'm way off the mark with the above attempt, but I'm sure you can see what I'm trying to achieve.

Any help will be appreciated.

Cheers

Carlton

1 ACCEPTED SOLUTION

avatar
@Carlton Patterson

You can use "mode("append")" to append the new data to existing one.

counts.coalesce(1).write.mode("append").csv("/home/packt/Downloads/myresults7-"+currentdate+".csv")     

P.S please use 'reply' on this comment instead of writing a new comment. In this way we can maintain the conversaion in order.

View solution in original post

13 REPLIES 13

avatar

@Carlton Patterson You can use the python's datetime package to obtain the current date.

import datetime

currentdate = datetime.datetime.now().strftime("%Y-%m-%d")

print currentdate

>>> 2018-08-13

And then use the currentdate in output file name.

counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3-" + currentdate + ".csv") 

Hope this helps.

P.S. If you want date and time use: datetime.datetime.now().strftime("%Y-%m-%d %H:%M")

avatar

Looks like there are 3 questions with same description...

avatar
Explorer

Sandeep,

Thanks for reaching out.

I'm getting the following error from the import function

Append ResultsClear Results

File "<ipython-input-7-3dab170099f6>", line 3 import datetime currentdate = datetime.datetime.now().strftime("%Y-%m-%d") ^SyntaxError: invalid syntax

avatar

Looks like some issue with text formatting. Try this:

import datetime

currentdate = datetime.datetime.now().strftime("%Y-%m-%d")

print currentdate

>>> 2018-08-13

avatar
Explorer

The syntax error is with 'currentdate'

avatar
Explorer

I now get the following error:

File "<ipython-input-13-588f4561c3f0>", line 7 print currentdate() ^SyntaxError: invalid syntax

The invalid syntax is currentdate()

Without the parentheses I get the following error:

File "<ipython-input-14-8d268659919b>", line 1 print currentdate ^SyntaxError: Missing parentheses in call to 'print'

avatar

What is the python version you are using? if its python 3.x

Use: print(currentdate)

avatar
Explorer

I'm using python version 3 and print(currentate) worked. Thanks. However, when I run the full query I get the following error:

ipython-input-22-8c743396e037> in <module>()     18FROMHumanResources_vEmployeeDepartment     19 ORDER BY FirstName, LastName DESC""")
---> 20counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/myresults7-"+currentdate+".csv")     
'DataFrameWriter' object has no attribute 'csvCONCAT'

avatar
Explorer

Hi Sandeep,

I should be clear about what I'm trying to achieve.

I would like the output to include only the delta change. I thought that having the current date would be sufficient, but I just realized that having just the currentdate won't let me know if there has been a change to the data.

Therefore, while your helping me could you also help me figure out how to include the currentdate and the delta change in data?

Much appreciated.

Cheers