Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Move entire folder based on date to S3

avatar
Contributor

Used case :  Need to transfer entire folder of files based on date to S3.

Ex :  Source Path (Linux) : /users/abc/20200312/gtry/gyyy.csv   

         S3    : /Users/datastore/20200312/gyyy.csv

 

Since dates keep changing everyday, and i need to build dataflow which would pick files from date folder

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Gubbi 

Depending on which processor is being used to create your FlowFile from you source linux directory, you will likely have an "absolute.path" FlowFile attribute created on the FlowFile.

absolute.path = /users/abc/20200312/gtry/

 

You can pass that FlowFile to an UpdateAttribute processor which can use NiFi Expression Language (EL) to extract the date from that absolute path in to a new FlowFile attribute

Add new property (property name becomes new FlowFile attribute):

Property:          Value:

pathDate          ${absolute.path:getDelimitedField('4','/')}

 

The resulting FlowFile will have a new attribute:
pathDate = 20200312

Now you can use that FlowFile attribute later when writing to your target directory in S3.

I assume you would use the putS3Object processor for this?
If so, you can configure the "Object Key" property with the following:

/Users/datastore/${pathDate}/${filename}

 

NiFi EL will replace ${pathDate} with "20200312" and ${filename} will be replaced with "gyyy.csv".

 

Hope this helps you,

Matt

View solution in original post

3 REPLIES 3

avatar
Contributor

@Shu_ashu  : I saw you had suggested similar pattern solution before. Could you please look into this and suggest approach.

 

https://community.cloudera.com/t5/Support-Questions/NiFi-Creating-the-output-directory-from-the-cont...

avatar
Contributor

@MattWho : Appreciate your inputs as well

avatar
Super Mentor

@Gubbi 

Depending on which processor is being used to create your FlowFile from you source linux directory, you will likely have an "absolute.path" FlowFile attribute created on the FlowFile.

absolute.path = /users/abc/20200312/gtry/

 

You can pass that FlowFile to an UpdateAttribute processor which can use NiFi Expression Language (EL) to extract the date from that absolute path in to a new FlowFile attribute

Add new property (property name becomes new FlowFile attribute):

Property:          Value:

pathDate          ${absolute.path:getDelimitedField('4','/')}

 

The resulting FlowFile will have a new attribute:
pathDate = 20200312

Now you can use that FlowFile attribute later when writing to your target directory in S3.

I assume you would use the putS3Object processor for this?
If so, you can configure the "Object Key" property with the following:

/Users/datastore/${pathDate}/${filename}

 

NiFi EL will replace ${pathDate} with "20200312" and ${filename} will be replaced with "gyyy.csv".

 

Hope this helps you,

Matt