Member since
10-10-2019
8
Posts
0
Kudos Received
0
Solutions
11-18-2019
11:39 AM
Hi @MattWho We are pursuing cloudera to get a backport of this feature in the current release, as we are unable to upgrade hdf to a version having this feature. Meanwhile I am trying to implement the same using the ExecuteScript(penalize flow file), and eliminate duplicates. When trying to eliminate the duplicates, the detectduplicate does not work as expected when 2 different file names are introduced in the flowfile. Is this expected behaviour? How do we handle this scenario to have unique file list of multiple files over a period of time. The first generateflowfile sets filename as file1.txt and second generateflowfile sets filename as file2.txt. When i disable one of the generateflowfile, it is filtering the duplicates as expected. But when both the generateflowfile are running, we are getting duplicates. Thank you
... View more
11-14-2019
12:09 PM
Thank you @MattWho , in case we are not able to upgrade to 1.10 or backport it to the current version. Would Wait processor be an acceptable direction to pursue?
... View more
11-14-2019
09:55 AM
Hi,
We have a process that generates text files in a ftp server. The process generates the files by writing multiple passes and could take from say more than a minute to a few minutes to generate the files.
The nifi ListSFTP in this case generates multiple files as we are using tracking timestamps. The file's timestamp is getting updated continuously during the generation process.
We dont have control over the existing process that generates the files and has no notification mechanism in place that can tell us when the file generation is complete.
What would be the best practice to handle this scenario in a List+FetchSFTP flow to avoid duplicate files generated in the target system.
Please advice.
Thank you
... View more
Labels:
- Labels:
-
Apache NiFi
10-11-2019
12:15 PM
Thank you Matt. Will take this direction and explore custom processor in subsequent iterations. Ram
... View more
10-11-2019
08:09 AM
Thank you, Matt. Certainly using update attributes with rule will be a modular approach. I will continue with this option. we are looking at externalizing the configuration of the sources and destinations outside of the nifi processes, as we get new sources and destinations frequently. If these rule mappings can be stored in a variable repository or even a flat file that can be maintained in a git repository, that would really help as well. Is it possible to externalize these rules to accomodate this? This would be something we are looking at, in the next iteration. Thank you Ram
... View more
10-10-2019
02:28 PM
Hi,
We have a requirement to retrieve files from SFTP server and place into to different prefixes in a S3 bucket.
Mapping is something like this:
ftp path1 -> file pattern match1 -> s3 prefix1
ftp path1 -> file pattern match2 -> s3 prefix2
ftp path2 -> file pattern match3 -> s3 prefix3
ftp path3 -> match4 -> s3 prefix4
ListSFTP -> FetchSFTP -> RouteOnAttribute
RouteOnAttribute - Match1 (${filename:indexOf('pattern1'):gt(-1)}) - putS3Object1
RouteOnAttribute - Match2 (${filename:indexOf('pattern2'):gt(-1)}) - putS3Object2
We are using the ListSFTP, FetchSFTP and RouteOnAttribute to route it to the a corresponding putS3Object processor with different variables for different patterns.
We initially had only 2 patterns which grew into 10 - 15 patterns (Match1, Match2, .. Match N) to be routed to the corresponding paths. So if we follow the same approach we have to use 10 -15 putS3Object. Is there a way to avoid this?
Preferably is there a way to match patterns on the file names, to a lookup table to identify a prefix for the corresponding pattern?
Is there a pattern that we can use to achieve this?
Thank you
... View more
Labels:
- Labels:
-
Apache NiFi