Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

can nifi pull a folder instead of file

avatar
Contributor

Hi 

 

I currently have to extract folder from other computer via NiFi SFTP. I been finding for solution and notice that the NiFi processor (GetFile, GetSFTP) only allow to extract files instead of folder. Is there any nifi processor that is able to extract the folder? 

Example:

 

Computer 1

directory: C:\User\localFolder

 

Computer 2 

Using GetSFTP processor (remote path: C:\User) to extract the folder

 

Thanks 

1 ACCEPTED SOLUTION

avatar
Super Mentor

@techNerd 

Questions for you:
1. Is Computer B consuming all the files LocalFolder and from every sub-directory within c:\users\LocalFolder?  If yes, then your SFTP processor should be configured to Search Recursively.
2. Once consumed by computer B, are the original file being left on Computer A?  If yes, you do NOT want to use the GetSFTP processor as it will refetch the same files over and over again each time it executes since they are not being removed from Computer A.

 

A typical dataflow here would look like this:

MattWho_0-1624028337621.png

The ListSFTP processor would be configured to use as follows:

MattWho_5-1624029218626.png

The "Tracking Entities" listing strategy will help when source files may have older timestamps than files previously listed from a different directory.  Using this listing strategy requires you to set a "Entity Tracking State Cache".  There are multiple cache services to choose from.  I simply used the "DistirbutedMapCacheClientService" controller service which gets configured to point at a "DistributedMapCacheServer"controller service I also setup within the same NiFi.
The "Remote Path" is set to the top most folder you want to start listing files from ( C:\User\LocalFolder\ ).
The "Search Recursively" property is set to "True" so that any random new sub folders added are also searched.

The "Success" relationship is then routed via a connection from the ListSFTP to the FetchSFTP processor.

 

The FlowFiles produced by the ListSFTP processor will have numerous FlowFile Attributes set on them that will be used later to persist your directory structure and to fetch the content via the FetchSFTP processor.
You can list the flowfiles on a queue and click the "view details" icon next to nay listed FlowFile to see the attributes currently assigned to that FlowFile.

For example:

 

filename = testfile-32.txt
path = /tmp/LocalFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

Then if you look at attributes for file from an added SubFolder:

 

filename = testfile-31.txt
path = /tmp/LocalFolder/SubFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

 

The FetchSFTP processor then uses these attributes set on each FlowFile to fetch the actual content from the source SFTP server:

MattWho_6-1624032616946.png

Downstream in your dataflow after you have fetched the content, you can still use the Attributes on each FlowFile when working with yoru FlowFiles.

For example using the "path" attribute to dynamically control where a FlowFile will be written on a target system (might be the local NiFi via PutFile processor,  another SFTP server somewhere else, etc.).

If you found this addressed your query, please take a moment to login and click "Accept" on all provided solutions that helped.

Thank you,

Matt

View solution in original post

7 REPLIES 7

avatar
Super Mentor

@tech 

Please share a little bit more around your use case.
The NiFi SFTP based processors are designed to create a FlowFile for each File retrieved from the target SFTP server and path.  

You want to retrieve a directory.  NiFi has to write output from the retrieval to a FlowFile, so what are you looking to have written to that FlowFile?  Then what do you want NiFi to do with folder once it is retrieved?

More typically what is done in this situation is the ListSFTP [1]/FetchSFTP [2] (used when using NiFi cluster) or GetSFTP [3] (used when it is a standalone NiFi) processor are configured to retrieve all files from within a target folder on the SFTP server.  One FlowFile is produced for each FlowFile that is retrieved.  These processors all write FlowFile attributes to the produced FlowFiles.  This includes the "path" attribute that tracks the directory/folder the files were fetched from.  NiFi Expression Language (NEL) [4] is what is used to interact with these FlowFile attributes within your NiFi dataflow(s).

Even if you did this outside of NiFi, you would get the folder and all files within it.  Using above path attribute, each FlowFile knows folder name, so you could use that FlowFile attribute to write those FlowFile out to a target within the same original folder name.  So same outcome can be accomplished.

You can then do many things depending on your use case with these files once they are retrieved.
Maybe use a mergeContent processor to create  a tar file with all files added to it....
Maybe write these FlowFiles to a new target destination within a folder of same name as source...
etc.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[2]  https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[3] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[4] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html


If this addressed your query, please take a moment to login and click "Accept" on this solution.
Thank you,

Matt

avatar
Contributor

Hi @MattWho 

 

The following is my use case:

 

There are 2 computer A & B. 

Computer A has a folder named "LocalFolder" (example: C:\User\LocalFolder\)

Computer B has a GetSFTP NiFi processor that listen to and extract files inside the Computer A "LocalFolder" Folder 

 

In odd cases, Computer A will generate a new folder named "newFolder" located in the "C:\User\LocalFolder" (example: C:\User\LocalFolder\newFolder) which is not known to Computer B (sudden appear). 

 

Is there any way computer B will

1) Get known of the new generated folder by Computer A and Computer B will generate the folder that is same at Computer A size

2) Extract all the file in computer A folder to Computer B newly generated folder

 

Thank & it will be a good help if there is Example with internal NiFi configuration and explanation. Appreciate

  

avatar
Super Mentor

@techNerd 

Questions for you:
1. Is Computer B consuming all the files LocalFolder and from every sub-directory within c:\users\LocalFolder?  If yes, then your SFTP processor should be configured to Search Recursively.
2. Once consumed by computer B, are the original file being left on Computer A?  If yes, you do NOT want to use the GetSFTP processor as it will refetch the same files over and over again each time it executes since they are not being removed from Computer A.

 

A typical dataflow here would look like this:

MattWho_0-1624028337621.png

The ListSFTP processor would be configured to use as follows:

MattWho_5-1624029218626.png

The "Tracking Entities" listing strategy will help when source files may have older timestamps than files previously listed from a different directory.  Using this listing strategy requires you to set a "Entity Tracking State Cache".  There are multiple cache services to choose from.  I simply used the "DistirbutedMapCacheClientService" controller service which gets configured to point at a "DistributedMapCacheServer"controller service I also setup within the same NiFi.
The "Remote Path" is set to the top most folder you want to start listing files from ( C:\User\LocalFolder\ ).
The "Search Recursively" property is set to "True" so that any random new sub folders added are also searched.

The "Success" relationship is then routed via a connection from the ListSFTP to the FetchSFTP processor.

 

The FlowFiles produced by the ListSFTP processor will have numerous FlowFile Attributes set on them that will be used later to persist your directory structure and to fetch the content via the FetchSFTP processor.
You can list the flowfiles on a queue and click the "view details" icon next to nay listed FlowFile to see the attributes currently assigned to that FlowFile.

For example:

 

filename = testfile-32.txt
path = /tmp/LocalFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

Then if you look at attributes for file from an added SubFolder:

 

filename = testfile-31.txt
path = /tmp/LocalFolder/SubFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

 

The FetchSFTP processor then uses these attributes set on each FlowFile to fetch the actual content from the source SFTP server:

MattWho_6-1624032616946.png

Downstream in your dataflow after you have fetched the content, you can still use the Attributes on each FlowFile when working with yoru FlowFiles.

For example using the "path" attribute to dynamically control where a FlowFile will be written on a target system (might be the local NiFi via PutFile processor,  another SFTP server somewhere else, etc.).

If you found this addressed your query, please take a moment to login and click "Accept" on all provided solutions that helped.

Thank you,

Matt

avatar
Community Manager

@techNerd  have you resolved your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 

cjervis_0-1624451741657.png

 

 

 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Contributor

Hi @MattWho ,

 

Thanks for your help. Appreciated.

avatar
New Contributor

Hi @techNerd 
Is the issue solved??
I installed nifi on a server and am using getSFTP and PutSFTP together to get some data from a remote server. It's able to fetch the files but the directory structure is compromised. I've tried

${path}. I've also enabled 'search recursively' in getSFTP properties.
To understand it better I have shared the folder structure in remote host:

naveenb_1-1686219711550.png

and nifi host after running GetSFTP processor and PutSFTP processor

naveenb_0-1686225903503.png

it copies everything beginning from root directory i.e., '/'. May be the way it works is it first executes pwd and copies accordingly.

 

below screenshot is the properties of getSFTP and putSFTP(I am not sharing the host,key details for security purpose).

naveenb_1-1686226021723.png

 

 

Thanks

avatar
Super Mentor

@naveenb 

Your query will get better visibility by starting a new question in the community rather then asking on an already solved question.

NiFi's ListSFTP and GetSFTP (deprecated in favor of listSFTP and FetchSFTP) processor only lists/gets files.  When it generates a NiFi FlowFile from a file it finds recursively within the source SFTP server configured base directory, it adds a "path" attribute to that FlowFile.  That "path" attribute has the absolute path to the file.

So based on your configuration, the results you are seeing are expected since you configured your putSFTP with "/home/ubuntu/samplenifi/${path}"

Were "path" attribute on your FlowFiles resolves to "/home/nifiuser/nifitest/sample" for files found in that source subdirectory.

You can use NiFi expression language (NEL) to modify that "path" attribute string to get rid of the "/home/nifiuser" portion

/home/ubuntu/samplenifi/${path:substringAfter('/home/nifiuser')}

 

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt