Support Questions
Find answers, ask questions, and share your expertise

can nifi pull a folder instead of file

Explorer

Hi 

 

I currently have to extract folder from other computer via NiFi SFTP. I been finding for solution and notice that the NiFi processor (GetFile, GetSFTP) only allow to extract files instead of folder. Is there any nifi processor that is able to extract the folder? 

Example:

 

Computer 1

directory: C:\User\localFolder

 

Computer 2 

Using GetSFTP processor (remote path: C:\User) to extract the folder

 

Thanks 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: can nifi pull a folder instead of file

Master Guru

@techNerd 

Questions for you:
1. Is Computer B consuming all the files LocalFolder and from every sub-directory within c:\users\LocalFolder?  If yes, then your SFTP processor should be configured to Search Recursively.
2. Once consumed by computer B, are the original file being left on Computer A?  If yes, you do NOT want to use the GetSFTP processor as it will refetch the same files over and over again each time it executes since they are not being removed from Computer A.

 

A typical dataflow here would look like this:

MattWho_0-1624028337621.png

The ListSFTP processor would be configured to use as follows:

MattWho_5-1624029218626.png

The "Tracking Entities" listing strategy will help when source files may have older timestamps than files previously listed from a different directory.  Using this listing strategy requires you to set a "Entity Tracking State Cache".  There are multiple cache services to choose from.  I simply used the "DistirbutedMapCacheClientService" controller service which gets configured to point at a "DistributedMapCacheServer"controller service I also setup within the same NiFi.
The "Remote Path" is set to the top most folder you want to start listing files from ( C:\User\LocalFolder\ ).
The "Search Recursively" property is set to "True" so that any random new sub folders added are also searched.

The "Success" relationship is then routed via a connection from the ListSFTP to the FetchSFTP processor.

 

The FlowFiles produced by the ListSFTP processor will have numerous FlowFile Attributes set on them that will be used later to persist your directory structure and to fetch the content via the FetchSFTP processor.
You can list the flowfiles on a queue and click the "view details" icon next to nay listed FlowFile to see the attributes currently assigned to that FlowFile.

For example:

 

filename = testfile-32.txt
path = /tmp/LocalFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

Then if you look at attributes for file from an added SubFolder:

 

filename = testfile-31.txt
path = /tmp/LocalFolder/SubFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

 

The FetchSFTP processor then uses these attributes set on each FlowFile to fetch the actual content from the source SFTP server:

MattWho_6-1624032616946.png

Downstream in your dataflow after you have fetched the content, you can still use the Attributes on each FlowFile when working with yoru FlowFiles.

For example using the "path" attribute to dynamically control where a FlowFile will be written on a target system (might be the local NiFi via PutFile processor,  another SFTP server somewhere else, etc.).

If you found this addressed your query, please take a moment to login and click "Accept" on all provided solutions that helped.

Thank you,

Matt

View solution in original post

5 REPLIES 5

Re: can nifi pull a folder instead of file

Master Guru

@tech 

Please share a little bit more around your use case.
The NiFi SFTP based processors are designed to create a FlowFile for each File retrieved from the target SFTP server and path.  

You want to retrieve a directory.  NiFi has to write output from the retrieval to a FlowFile, so what are you looking to have written to that FlowFile?  Then what do you want NiFi to do with folder once it is retrieved?

More typically what is done in this situation is the ListSFTP [1]/FetchSFTP [2] (used when using NiFi cluster) or GetSFTP [3] (used when it is a standalone NiFi) processor are configured to retrieve all files from within a target folder on the SFTP server.  One FlowFile is produced for each FlowFile that is retrieved.  These processors all write FlowFile attributes to the produced FlowFiles.  This includes the "path" attribute that tracks the directory/folder the files were fetched from.  NiFi Expression Language (NEL) [4] is what is used to interact with these FlowFile attributes within your NiFi dataflow(s).

Even if you did this outside of NiFi, you would get the folder and all files within it.  Using above path attribute, each FlowFile knows folder name, so you could use that FlowFile attribute to write those FlowFile out to a target within the same original folder name.  So same outcome can be accomplished.

You can then do many things depending on your use case with these files once they are retrieved.
Maybe use a mergeContent processor to create  a tar file with all files added to it....
Maybe write these FlowFiles to a new target destination within a folder of same name as source...
etc.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[2]  https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[3] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.13.2/org.apach...

[4] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html


If this addressed your query, please take a moment to login and click "Accept" on this solution.
Thank you,

Matt

Re: can nifi pull a folder instead of file

Explorer

Hi @MattWho 

 

The following is my use case:

 

There are 2 computer A & B. 

Computer A has a folder named "LocalFolder" (example: C:\User\LocalFolder\)

Computer B has a GetSFTP NiFi processor that listen to and extract files inside the Computer A "LocalFolder" Folder 

 

In odd cases, Computer A will generate a new folder named "newFolder" located in the "C:\User\LocalFolder" (example: C:\User\LocalFolder\newFolder) which is not known to Computer B (sudden appear). 

 

Is there any way computer B will

1) Get known of the new generated folder by Computer A and Computer B will generate the folder that is same at Computer A size

2) Extract all the file in computer A folder to Computer B newly generated folder

 

Thank & it will be a good help if there is Example with internal NiFi configuration and explanation. Appreciate

  

Re: can nifi pull a folder instead of file

Master Guru

@techNerd 

Questions for you:
1. Is Computer B consuming all the files LocalFolder and from every sub-directory within c:\users\LocalFolder?  If yes, then your SFTP processor should be configured to Search Recursively.
2. Once consumed by computer B, are the original file being left on Computer A?  If yes, you do NOT want to use the GetSFTP processor as it will refetch the same files over and over again each time it executes since they are not being removed from Computer A.

 

A typical dataflow here would look like this:

MattWho_0-1624028337621.png

The ListSFTP processor would be configured to use as follows:

MattWho_5-1624029218626.png

The "Tracking Entities" listing strategy will help when source files may have older timestamps than files previously listed from a different directory.  Using this listing strategy requires you to set a "Entity Tracking State Cache".  There are multiple cache services to choose from.  I simply used the "DistirbutedMapCacheClientService" controller service which gets configured to point at a "DistributedMapCacheServer"controller service I also setup within the same NiFi.
The "Remote Path" is set to the top most folder you want to start listing files from ( C:\User\LocalFolder\ ).
The "Search Recursively" property is set to "True" so that any random new sub folders added are also searched.

The "Success" relationship is then routed via a connection from the ListSFTP to the FetchSFTP processor.

 

The FlowFiles produced by the ListSFTP processor will have numerous FlowFile Attributes set on them that will be used later to persist your directory structure and to fetch the content via the FetchSFTP processor.
You can list the flowfiles on a queue and click the "view details" icon next to nay listed FlowFile to see the attributes currently assigned to that FlowFile.

For example:

 

filename = testfile-32.txt
path = /tmp/LocalFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

Then if you look at attributes for file from an added SubFolder:

 

filename = testfile-31.txt
path = /tmp/LocalFolder/SubFolder
sftp.remote.host = <SFTP server hostname>
sftp.remote.port = 22

 

 

The FetchSFTP processor then uses these attributes set on each FlowFile to fetch the actual content from the source SFTP server:

MattWho_6-1624032616946.png

Downstream in your dataflow after you have fetched the content, you can still use the Attributes on each FlowFile when working with yoru FlowFiles.

For example using the "path" attribute to dynamically control where a FlowFile will be written on a target system (might be the local NiFi via PutFile processor,  another SFTP server somewhere else, etc.).

If you found this addressed your query, please take a moment to login and click "Accept" on all provided solutions that helped.

Thank you,

Matt

View solution in original post

Re: can nifi pull a folder instead of file

Community Manager

@techNerd  have you resolved your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 

cjervis_0-1624451741657.png

 

 

 


Cy Jervis, Manager, Community Program

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Re: can nifi pull a folder instead of file

Explorer

Hi @MattWho ,

 

Thanks for your help. Appreciated.