Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Dynamic routing to putS3 Object based on file path

Solved Go to solution

Dynamic routing to putS3 Object based on file path

Hi,

We have  a requirement to retrieve files from SFTP server and place into to different prefixes in a S3 bucket.

 

Mapping is something like this:

 

ftp path1 -> file pattern match1 -> s3 prefix1

ftp path1 -> file pattern match2 -> s3 prefix2

ftp path2 -> file pattern match3 -> s3 prefix3

ftp path3 -> match4 -> s3 prefix4

 

ListSFTP -> FetchSFTP -> RouteOnAttribute 

RouteOnAttribute - Match1 (${filename:indexOf('pattern1'):gt(-1)}) - putS3Object1

RouteOnAttribute - Match2 (${filename:indexOf('pattern2'):gt(-1)}) - putS3Object2

 

We are using the ListSFTP, FetchSFTP and RouteOnAttribute to route it to the a corresponding putS3Object processor with different variables for different patterns.

 

We initially had only 2 patterns which grew into 10 - 15 patterns (Match1, Match2, .. Match N) to be routed to the corresponding paths. So if we follow the same approach we have to use 10 -15 putS3Object. Is there a way to avoid this?

 

Preferably is there a way to match patterns on the file names, to a lookup table to identify a prefix for the corresponding pattern?

 

Is there a pattern that we can use to achieve this?

 

Thank you

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted

Re: Dynamic routing to putS3 Object based on file path

Master Guru

Hello @littlesea374 

 

The PutS3Object processor supports NiFi Expression Language on the majority of its properties.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-aws-nar/1.5.0/org.apache.nifi...

So rather then routing and creating a separate putS3Object processor for each unique setup of FlowFiles, you may consider using the Advanced UI (Screen Shot 2019-10-11 at 9.50.04 AM.png) in the UpdateAttribute to set FlowFile attributes to unique values based on your routing rules.

You essentially create a new "Rule" for each of your unique match criteria.
For each Rule you create an expression.  If the expression for a rule matches (resolves to "true"), all the actions will be applied to the matching FlowFile.
Those actions would simply be setting FlowFile attributes for the putS3Object processor properties.

Designing your dataflow in this manor allows you to scale up by simply adding new rules to the UpdateAttribute processor.  The only time you would need a different putS3Object processor is if some match requires a different property configured that does not support NiFi Expression Language (for example: different AWS credentials or region)

 

Hope this helps,

Matt

Re: Dynamic routing to putS3 Object based on file path

Master Guru

@littlesea374 

 

There is no capability to externalize these rules used in the UpdateAttribute processor.

There are numerous "lookup" processors within NiFi; however, none of them integrate with git.

I think you may need a custom processor to do what you are looking for.  Contributions to the Apache NiFi community are always welcome.

NiFi also offers numerous scripting processors which allow you to write your own scripts. 

 

Matt

 

 

4 REPLIES 4
Highlighted

Re: Dynamic routing to putS3 Object based on file path

Master Guru

Hello @littlesea374 

 

The PutS3Object processor supports NiFi Expression Language on the majority of its properties.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-aws-nar/1.5.0/org.apache.nifi...

So rather then routing and creating a separate putS3Object processor for each unique setup of FlowFiles, you may consider using the Advanced UI (Screen Shot 2019-10-11 at 9.50.04 AM.png) in the UpdateAttribute to set FlowFile attributes to unique values based on your routing rules.

You essentially create a new "Rule" for each of your unique match criteria.
For each Rule you create an expression.  If the expression for a rule matches (resolves to "true"), all the actions will be applied to the matching FlowFile.
Those actions would simply be setting FlowFile attributes for the putS3Object processor properties.

Designing your dataflow in this manor allows you to scale up by simply adding new rules to the UpdateAttribute processor.  The only time you would need a different putS3Object processor is if some match requires a different property configured that does not support NiFi Expression Language (for example: different AWS credentials or region)

 

Hope this helps,

Matt

Re: Dynamic routing to putS3 Object based on file path

Thank you, Matt.

Certainly using update attributes with rule will be a modular approach. I will continue with this option. we are looking at externalizing the configuration of the sources and destinations outside of the nifi processes, as we get new sources and destinations frequently. If these rule mappings can be stored in a variable repository or even a flat file that can be maintained in a git repository, that would really help as well.

 

Is it possible to externalize these rules to accomodate this? This would be something we are looking at, in the next iteration.

 

Thank you

 

Ram

Re: Dynamic routing to putS3 Object based on file path

Master Guru

@littlesea374 

 

There is no capability to externalize these rules used in the UpdateAttribute processor.

There are numerous "lookup" processors within NiFi; however, none of them integrate with git.

I think you may need a custom processor to do what you are looking for.  Contributions to the Apache NiFi community are always welcome.

NiFi also offers numerous scripting processors which allow you to write your own scripts. 

 

Matt

 

 

Re: Dynamic routing to putS3 Object based on file path

 

Thank you Matt. Will take this direction and explore custom processor in subsequent iterations.

 

Ram

Don't have an account?
Coming from Hortonworks? Activate your account here