Member since
09-29-2021
4
Posts
0
Kudos Received
0
Solutions
02-28-2023
11:28 AM
1 Kudo
@TRSS_Cloudera The issue you have described links to this known issue reported in Apache NiFi" https://issues.apache.org/jira/browse/NIFI-10792 The discussion found in the comments of this jira point to a couple workarounds which includes the negatives of each. From that discussion it appears the best approach is development of a new "Excel Record Reader" controller service that could be used by the existing ConvertRecord processor and CSVRecordSetWriter. This is outlined in following jira: https://issues.apache.org/jira/browse/NIFI-11167 If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt
... View more
05-31-2022
08:53 AM
@TRSS_Cloudera Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks!
... View more
10-19-2021
01:55 PM
1 Kudo
@TRSS_Cloudera It i snot clear to me how you have designed your dataflow to remove all files from source SFTP server except newest file? Assuming state was not an issue (since you said you flow works if you manually clear state), how do you have your flow built? There exists a GetSFTP processor that does not maintain state. So you could have your flow that uses the listSFTP and FetchSFTP to always get the newest "history" file and record that that latest "history" files last modified timestamp in something like a distributedMapCache server. Then have your GetFile run once a day using the "Cron driven" scheduling strategy to get all files (Delete Original= false)in that directory (would get latest history file also) and then get the current stored last modified time from the map cache and then via a RouteOnAttribute send any to FlowFiles where last modified stored is newer then what is on files retrieved by GetFile and finally send to a processor to remove them from source SFTP processor. While above would work in an "ideal" world. You would run in to issues when their was an interruption in the running dataflow causing multiple new files to get listed by the listSFTP processor because you would not know which one end up having its last modified timestamp stored in distributedMapCache. But in such a case the worst case if you have a couple files left lingering until the next run results in just one history file being listed and it goes back to expected. Otherwise, there are script base processor you could use to build you own scripted handling here. To be honest it seems like wasted IO to have NiFi consume these files int NiFi just to auto-terminate them when you could use an ExecuteStreamCommand processor to invoke a script that connects to your SFTP server and simply removes what you do not want without needing to pull anything across the network or write file content to NiFi that you don't need Hopefully this gives you some options to think about. If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
10-05-2021
11:33 PM
@TRSS_Cloudera Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more