Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can I get the files in the middle of the data flow?

Solved Go to solution
Highlighted

Can I get the files in the middle of the data flow?

New Contributor

Can I get the files in the middle of the data flow?

I know I can get files by getfile processors but it is limited to the beginning of the data flow, please advise how can I get retrieve the files in the middle of the data flow?

The reason is that I would like to pass the dynamic generated directory to be retrieve in the getfile / similar processor, it needs to be middle of the flow.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Can I get the files in the middle of the data flow?

Super Guru

@adam chui

If you are having fully qualified filename with the directory information in your flow then you can use Fetch File Processor as this processor accepts incoming connection and pass the attributes(directory/filename) in File to Fetch Property to pull the File into the flow.

If you are not having fully qualified filename then we need to list all the files in the directory by using ExecuteStreamCommand processor by passing the dynamic generated directory name as an argument to list all the files in the directory then using Fetch File processor you can pull the required files into data flow.

Please refer to this link i have explained how to use ExecuteStreamCommand processor to list all the files in the directory,in addition to filter only the required filenames you can use RouteOnAttribute Processor before FetchFile Processor.

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

3 REPLIES 3

Re: Can I get the files in the middle of the data flow?

Super Guru

@adam chui

If you are having fully qualified filename with the directory information in your flow then you can use Fetch File Processor as this processor accepts incoming connection and pass the attributes(directory/filename) in File to Fetch Property to pull the File into the flow.

If you are not having fully qualified filename then we need to list all the files in the directory by using ExecuteStreamCommand processor by passing the dynamic generated directory name as an argument to list all the files in the directory then using Fetch File processor you can pull the required files into data flow.

Please refer to this link i have explained how to use ExecuteStreamCommand processor to list all the files in the directory,in addition to filter only the required filenames you can use RouteOnAttribute Processor before FetchFile Processor.

-

If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Re: Can I get the files in the middle of the data flow?

Super Guru

@adam chui

Sure..

I have created a directory called nifi_test in tmp directory.

[bash tmp]$ mkdir nifi_test<br>[bash tmp]$ cd nifi_test/
[bash  nifi_test]$ touch test.txt
[bash nifi_test]$ touch test1.txt
[bash nifi_test]$ touch test2.txt
[bash  nifi_test]$ ll
total 0
-rw-r--r-- 1 nifi nifi 0 May 10 19:16 test1.txt
-rw-r--r-- 1 nifi nifi 0 May 10 19:16 test2.txt
-rw-r--r-- 1 nifi nifi 0 May 10 19:16 test.txt<br>

Make sure nifi having access to pull the files in the directory.
Let's assume you are having dynamic generated directory attribute value as /tmp/nifi_test/ in middle of the flow.

Now we need to fetch all the files that are in /tmp/nifi_test directory

Flow:-

72737-flow.png

GenerateFlowFile configs:-

i have added new property as

directory

/tmp/nifi_test

now i'm having a flowfile with directory attribute with /tmp/nifi_test as a value.

ExecuteStreamCommand configs:

72738-escommand.png

Now i'm passing directory attribute as command attribute and listing all the files in the directory(/tmp/nifi_test)

SplitText configs:-
When you are having more than one file in the directory use this processor to split into individual flowfile

Change the below property value
Line Split Count

1

Extract Text Configs:-

we need to dynamically pull all the files from the directory so use extract text processor add new property as

filename

(.*)

in this processor we are extracting flowfile content and keeping for the filename attribute

Now we are having directory and filenames in the directory as attributes now.

Fetch File Configs:-

72739-fetchfile.png

In File to Fetch property we are using directory and filename attributes to fetch the file/s from the directory, at the end flow screenshot you can see 3 files got fetched from the directory.

By following this way we are able to pull files middle of the flow.

I have added my flow.xml save/upload xml to your nifi istance and test it out.

fetch-files-189935.xml

Re: Can I get the files in the middle of the data flow?

New Contributor

Could you give me a concrete example for that?

If you are not having fully qualified filename then we need to list all the files in the directory by using ExecuteStreamCommand processor by passing the dynamic generated directory name as an argument to list all the files in the directory then using Fetch File processor you can pull the required files into data flow.