Created on 10-10-2019 02:28 PM - last edited on 10-11-2019 12:49 AM by VidyaSargur
Hi,
We have a requirement to retrieve files from SFTP server and place into to different prefixes in a S3 bucket.
Mapping is something like this:
ftp path1 -> file pattern match1 -> s3 prefix1
ftp path1 -> file pattern match2 -> s3 prefix2
ftp path2 -> file pattern match3 -> s3 prefix3
ftp path3 -> match4 -> s3 prefix4
ListSFTP -> FetchSFTP -> RouteOnAttribute
RouteOnAttribute - Match1 (${filename:indexOf('pattern1'):gt(-1)}) - putS3Object1
RouteOnAttribute - Match2 (${filename:indexOf('pattern2'):gt(-1)}) - putS3Object2
We are using the ListSFTP, FetchSFTP and RouteOnAttribute to route it to the a corresponding putS3Object processor with different variables for different patterns.
We initially had only 2 patterns which grew into 10 - 15 patterns (Match1, Match2, .. Match N) to be routed to the corresponding paths. So if we follow the same approach we have to use 10 -15 putS3Object. Is there a way to avoid this?
Preferably is there a way to match patterns on the file names, to a lookup table to identify a prefix for the corresponding pattern?
Is there a pattern that we can use to achieve this?
Thank you
Created 10-11-2019 06:58 AM
Hello @littlesea374
The PutS3Object processor supports NiFi Expression Language on the majority of its properties.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-aws-nar/1.5.0/org.apache.nifi...
So rather then routing and creating a separate putS3Object processor for each unique setup of FlowFiles, you may consider using the Advanced UI () in the UpdateAttribute to set FlowFile attributes to unique values based on your routing rules.
You essentially create a new "Rule" for each of your unique match criteria.
For each Rule you create an expression. If the expression for a rule matches (resolves to "true"), all the actions will be applied to the matching FlowFile.
Those actions would simply be setting FlowFile attributes for the putS3Object processor properties.
Designing your dataflow in this manor allows you to scale up by simply adding new rules to the UpdateAttribute processor. The only time you would need a different putS3Object processor is if some match requires a different property configured that does not support NiFi Expression Language (for example: different AWS credentials or region)
Hope this helps,
Matt
Created 10-11-2019 08:42 AM
There is no capability to externalize these rules used in the UpdateAttribute processor.
There are numerous "lookup" processors within NiFi; however, none of them integrate with git.
I think you may need a custom processor to do what you are looking for. Contributions to the Apache NiFi community are always welcome.
NiFi also offers numerous scripting processors which allow you to write your own scripts.
Matt
Created 10-11-2019 06:58 AM
Hello @littlesea374
The PutS3Object processor supports NiFi Expression Language on the majority of its properties.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-aws-nar/1.5.0/org.apache.nifi...
So rather then routing and creating a separate putS3Object processor for each unique setup of FlowFiles, you may consider using the Advanced UI () in the UpdateAttribute to set FlowFile attributes to unique values based on your routing rules.
You essentially create a new "Rule" for each of your unique match criteria.
For each Rule you create an expression. If the expression for a rule matches (resolves to "true"), all the actions will be applied to the matching FlowFile.
Those actions would simply be setting FlowFile attributes for the putS3Object processor properties.
Designing your dataflow in this manor allows you to scale up by simply adding new rules to the UpdateAttribute processor. The only time you would need a different putS3Object processor is if some match requires a different property configured that does not support NiFi Expression Language (for example: different AWS credentials or region)
Hope this helps,
Matt
Created 10-11-2019 08:09 AM
Thank you, Matt.
Certainly using update attributes with rule will be a modular approach. I will continue with this option. we are looking at externalizing the configuration of the sources and destinations outside of the nifi processes, as we get new sources and destinations frequently. If these rule mappings can be stored in a variable repository or even a flat file that can be maintained in a git repository, that would really help as well.
Is it possible to externalize these rules to accomodate this? This would be something we are looking at, in the next iteration.
Thank you
Ram
Created 10-11-2019 08:42 AM
There is no capability to externalize these rules used in the UpdateAttribute processor.
There are numerous "lookup" processors within NiFi; however, none of them integrate with git.
I think you may need a custom processor to do what you are looking for. Contributions to the Apache NiFi community are always welcome.
NiFi also offers numerous scripting processors which allow you to write your own scripts.
Matt
Created 10-11-2019 12:15 PM
Thank you Matt. Will take this direction and explore custom processor in subsequent iterations.
Ram