Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to concatenate a date to a filename in pyspark

Solved Go to solution
Highlighted

How to concatenate a date to a filename in pyspark

New Contributor

Hello community,

I have created the following pyspark query:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('ops').getOrCreate()
df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/HumanResources_vEmployeeDepartment.csv',inferSchema=True,header=True)
df.createOrReplaceTempView('HumanResources_vEmployeeDepartment')
counts = spark.sql("""SELECT
FirstName
,LastName
,JobTitle
FROM HumanResources_vEmployeeDepartment
ORDER BY FirstName, LastName DESC""")
counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3.csv") 

I would like to add the current date and time to the file called myresults3.

I think the code would look something like the following:

counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/'myresults3'-CURRENTDATE.csv") 

I'm sure I'm way off the mark with the above attempt, but I'm sure you can see what I'm trying to achieve.

Any help will be appreciated.

Cheers

Carlton

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to concatenate a date to a filename in pyspark

@Carlton Patterson

You can use "mode("append")" to append the new data to existing one.

counts.coalesce(1).write.mode("append").csv("/home/packt/Downloads/myresults7-"+currentdate+".csv")     

P.S please use 'reply' on this comment instead of writing a new comment. In this way we can maintain the conversaion in order.

13 REPLIES 13

Re: How to concatenate a date to a filename in pyspark

@Carlton Patterson You can use the python's datetime package to obtain the current date.

import datetime

currentdate = datetime.datetime.now().strftime("%Y-%m-%d")

print currentdate

>>> 2018-08-13

And then use the currentdate in output file name.

counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3-" + currentdate + ".csv") 

Hope this helps.

P.S. If you want date and time use: datetime.datetime.now().strftime("%Y-%m-%d %H:%M")

Re: How to concatenate a date to a filename in pyspark

Looks like there are 3 questions with same description...

Re: How to concatenate a date to a filename in pyspark

New Contributor

Sandeep,

Thanks for reaching out.

I'm getting the following error from the import function

Append ResultsClear Results

File "<ipython-input-7-3dab170099f6>", line 3 import datetime currentdate = datetime.datetime.now().strftime("%Y-%m-%d") ^SyntaxError: invalid syntax

Re: How to concatenate a date to a filename in pyspark

Looks like some issue with text formatting. Try this:

import datetime

currentdate = datetime.datetime.now().strftime("%Y-%m-%d")

print currentdate

>>> 2018-08-13

Re: How to concatenate a date to a filename in pyspark

New Contributor

The syntax error is with 'currentdate'

Re: How to concatenate a date to a filename in pyspark

New Contributor

I now get the following error:

File "<ipython-input-13-588f4561c3f0>", line 7 print currentdate() ^SyntaxError: invalid syntax

The invalid syntax is currentdate()

Without the parentheses I get the following error:

File "<ipython-input-14-8d268659919b>", line 1 print currentdate ^SyntaxError: Missing parentheses in call to 'print'

Re: How to concatenate a date to a filename in pyspark

What is the python version you are using? if its python 3.x

Use: print(currentdate)

Re: How to concatenate a date to a filename in pyspark

New Contributor

I'm using python version 3 and print(currentate) worked. Thanks. However, when I run the full query I get the following error:

ipython-input-22-8c743396e037> in <module>()     18FROMHumanResources_vEmployeeDepartment     19 ORDER BY FirstName, LastName DESC""")
---> 20counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/myresults7-"+currentdate+".csv")     
'DataFrameWriter' object has no attribute 'csvCONCAT'

Re: How to concatenate a date to a filename in pyspark

New Contributor

Hi Sandeep,

I should be clear about what I'm trying to achieve.

I would like the output to include only the delta change. I thought that having the current date would be sufficient, but I just realized that having just the currentdate won't let me know if there has been a change to the data.

Therefore, while your helping me could you also help me figure out how to include the currentdate and the delta change in data?

Much appreciated.

Cheers

Don't have an account?
Coming from Hortonworks? Activate your account here