Created 06-15-2021 09:17 PM
Hi
I currently have to extract folder from other computer via NiFi SFTP. I been finding for solution and notice that the NiFi processor (GetFile, GetSFTP) only allow to extract files instead of folder. Is there any nifi processor that is able to extract the folder?
Example:
Computer 1
directory: C:\User\localFolder
Computer 2
Using GetSFTP processor (remote path: C:\User) to extract the folder
Thanks
Created 06-18-2021 09:15 AM
@techNerd
Questions for you:
1. Is Computer B consuming all the files LocalFolder and from every sub-directory within c:\users\LocalFolder? If yes, then your SFTP processor should be configured to Search Recursively.
2. Once consumed by computer B, are the original file being left on Computer A? If yes, you do NOT want to use the GetSFTP processor as it will refetch the same files over and over again each time it executes since they are not being removed from Computer A.
A typical dataflow here would look like this:
The ListSFTP processor would be configured to use as follows:
The "Tracking Entities" listing strategy will help when source files may have older timestamps than files previously listed from a different directory. Using this listing strategy requires you to set a "Entity Tracking State Cache". There are multiple cache services to choose from. I simply used the "DistirbutedMapCacheClientService" controller service which gets configured to point at a "DistributedMapCacheServer"controller service I also setup within the same NiFi.
The "Remote Path" is set to the top most folder you want to start listing files from ( C:\User\LocalFolder\ ).
The "Search Recursively" property is set to "True" so that any random new sub folders added are also searched.
The "Success" relationship is then routed via a connection from the ListSFTP to the FetchSFTP processor.
The FlowFiles produced by the ListSFTP processor will have numerous FlowFile Attributes set on them that will be used later to persist your directory structure and to fetch the content via the FetchSFTP processor.
You can list the flowfiles on a queue and click the "view details" icon next to nay listed FlowFile to see the attributes currently assigned to that FlowFile.
For example:
filename = testfile-32.txt
path = /tmp/LocalFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22
Then if you look at attributes for file from an added SubFolder:
filename = testfile-31.txt
path = /tmp/LocalFolder/SubFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22
The FetchSFTP processor then uses these attributes set on each FlowFile to fetch the actual content from the source SFTP server:
Downstream in your dataflow after you have fetched the content, you can still use the Attributes on each FlowFile when working with yoru FlowFiles.
For example using the "path" attribute to dynamically control where a FlowFile will be written on a target system (might be the local NiFi via PutFile processor, another SFTP server somewhere else, etc.).
If you found this addressed your query, please take a moment to login and click "Accept" on all provided solutions that helped.
Thank you,
Matt
Created 06-16-2021 06:19 AM
@tech
Please share a little bit more around your use case.
The NiFi SFTP based processors are designed to create a FlowFile for each File retrieved from the target SFTP server and path.
You want to retrieve a directory. NiFi has to write output from the retrieval to a FlowFile, so what are you looking to have written to that FlowFile? Then what do you want NiFi to do with folder once it is retrieved?
More typically what is done in this situation is the ListSFTP [1]/FetchSFTP [2] (used when using NiFi cluster) or GetSFTP [3] (used when it is a standalone NiFi) processor are configured to retrieve all files from within a target folder on the SFTP server. One FlowFile is produced for each FlowFile that is retrieved. These processors all write FlowFile attributes to the produced FlowFiles. This includes the "path" attribute that tracks the directory/folder the files were fetched from. NiFi Expression Language (NEL) [4] is what is used to interact with these FlowFile attributes within your NiFi dataflow(s).
Even if you did this outside of NiFi, you would get the folder and all files within it. Using above path attribute, each FlowFile knows folder name, so you could use that FlowFile attribute to write those FlowFile out to a target within the same original folder name. So same outcome can be accomplished.
You can then do many things depending on your use case with these files once they are retrieved.
Maybe use a mergeContent processor to create a tar file with all files added to it....
Maybe write these FlowFiles to a new target destination within a folder of same name as source...
etc.
[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...
[4] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
If this addressed your query, please take a moment to login and click "Accept" on this solution.
Thank you,
Matt
Created 06-17-2021 06:58 PM
Hi @MattWho
The following is my use case:
There are 2 computer A & B.
Computer A has a folder named "LocalFolder" (example: C:\User\LocalFolder\)
Computer B has a GetSFTP NiFi processor that listen to and extract files inside the Computer A "LocalFolder" Folder
In odd cases, Computer A will generate a new folder named "newFolder" located in the "C:\User\LocalFolder" (example: C:\User\LocalFolder\newFolder) which is not known to Computer B (sudden appear).
Is there any way computer B will
1) Get known of the new generated folder by Computer A and Computer B will generate the folder that is same at Computer A size
2) Extract all the file in computer A folder to Computer B newly generated folder
Thank & it will be a good help if there is Example with internal NiFi configuration and explanation. Appreciate
Created 06-18-2021 09:15 AM
@techNerd
Questions for you:
1. Is Computer B consuming all the files LocalFolder and from every sub-directory within c:\users\LocalFolder? If yes, then your SFTP processor should be configured to Search Recursively.
2. Once consumed by computer B, are the original file being left on Computer A? If yes, you do NOT want to use the GetSFTP processor as it will refetch the same files over and over again each time it executes since they are not being removed from Computer A.
A typical dataflow here would look like this:
The ListSFTP processor would be configured to use as follows:
The "Tracking Entities" listing strategy will help when source files may have older timestamps than files previously listed from a different directory. Using this listing strategy requires you to set a "Entity Tracking State Cache". There are multiple cache services to choose from. I simply used the "DistirbutedMapCacheClientService" controller service which gets configured to point at a "DistributedMapCacheServer"controller service I also setup within the same NiFi.
The "Remote Path" is set to the top most folder you want to start listing files from ( C:\User\LocalFolder\ ).
The "Search Recursively" property is set to "True" so that any random new sub folders added are also searched.
The "Success" relationship is then routed via a connection from the ListSFTP to the FetchSFTP processor.
The FlowFiles produced by the ListSFTP processor will have numerous FlowFile Attributes set on them that will be used later to persist your directory structure and to fetch the content via the FetchSFTP processor.
You can list the flowfiles on a queue and click the "view details" icon next to nay listed FlowFile to see the attributes currently assigned to that FlowFile.
For example:
filename = testfile-32.txt
path = /tmp/LocalFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22
Then if you look at attributes for file from an added SubFolder:
filename = testfile-31.txt
path = /tmp/LocalFolder/SubFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22
The FetchSFTP processor then uses these attributes set on each FlowFile to fetch the actual content from the source SFTP server:
Downstream in your dataflow after you have fetched the content, you can still use the Attributes on each FlowFile when working with yoru FlowFiles.
For example using the "path" attribute to dynamically control where a FlowFile will be written on a target system (might be the local NiFi via PutFile processor, another SFTP server somewhere else, etc.).
If you found this addressed your query, please take a moment to login and click "Accept" on all provided solutions that helped.
Thank you,
Matt
Created 06-23-2021 05:35 AM
@techNerd have you resolved your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Created 06-27-2021 11:29 PM
Created on 06-08-2023 03:25 AM - edited 06-08-2023 05:59 AM
Hi @techNerd
Is the issue solved??
I installed nifi on a server and am using getSFTP and PutSFTP together to get some data from a remote server. It's able to fetch the files but the directory structure is compromised. I've tried
${path}. I've also enabled 'search recursively' in getSFTP properties.
To understand it better I have shared the folder structure in remote host:
and nifi host after running GetSFTP processor and PutSFTP processor
it copies everything beginning from root directory i.e., '/'. May be the way it works is it first executes pwd and copies accordingly.
below screenshot is the properties of getSFTP and putSFTP(I am not sharing the host,key details for security purpose).
Thanks
Created 06-09-2023 10:24 AM
@naveenb
Your query will get better visibility by starting a new question in the community rather then asking on an already solved question.
NiFi's ListSFTP and GetSFTP (deprecated in favor of listSFTP and FetchSFTP) processor only lists/gets files. When it generates a NiFi FlowFile from a file it finds recursively within the source SFTP server configured base directory, it adds a "path" attribute to that FlowFile. That "path" attribute has the absolute path to the file.
So based on your configuration, the results you are seeing are expected since you configured your putSFTP with "/home/ubuntu/samplenifi/${path}"
Were "path" attribute on your FlowFiles resolves to "/home/nifiuser/nifitest/sample" for files found in that source subdirectory.
You can use NiFi expression language (NEL) to modify that "path" attribute string to get rid of the "/home/nifiuser" portion
/home/ubuntu/samplenifi/${path:substringAfter('/home/nifiuser')}
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt