- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Fetch Vs Get vs List processors in NiFi
- Labels:
-
Apache NiFi
Created ‎08-29-2019 01:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Pretty new to NiFi and trying to understand the difference between Fetch,Get and List processors.
List - As I understand, creates flow files with only metadata and not the data. This information can be further passed to downstream to read the file contents.
I am pretty confused about Get/Fetch and which one to be used under what situation.
Created ‎08-29-2019 07:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Teej
The short answer is that FetchX (FetchFTP for example) is Nifi cluster friendly, while GetX processors are not.
There is a common pattern ("List-Fetch") of using a single node to ListX then pass that List to all nodes in the cluster to do parallelized FetchX - the Fetch will be aware that there are multiple nodes and only Fetch each file once.
If you have a NiFi cluster and you are using the GetSFTP processor, you would have to configure that processor to run on the primary node only so the other nodes in the cluster wouldn't try to pull the same files.
You can read more about it here.
Created ‎08-29-2019 07:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Teej
The short answer is that FetchX (FetchFTP for example) is Nifi cluster friendly, while GetX processors are not.
There is a common pattern ("List-Fetch") of using a single node to ListX then pass that List to all nodes in the cluster to do parallelized FetchX - the Fetch will be aware that there are multiple nodes and only Fetch each file once.
If you have a NiFi cluster and you are using the GetSFTP processor, you would have to configure that processor to run on the primary node only so the other nodes in the cluster wouldn't try to pull the same files.
You can read more about it here.
