Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Fetch file from the NCM of a NIFI cluster

Highlighted

Fetch file from the NCM of a NIFI cluster

New Contributor

We have set up a NIFI cluster with a NCM and two slave nodes.We have created a simple flow in the NCM GUI to pull file from the local file system from the NCM and after updating some attributes load into the local file system of one of the slave node. The GetFile processor is unable to recognize the input directory that refers to a path in the file system of the NCM. Is it possible to set the 'Input Directory' property in GetFile so that it points to a specific path in the local file system of either the NCM or the slave nodes?

4 REPLIES 4

Re: Fetch file from the NCM of a NIFI cluster

Guru

The purpose of the NCM is solely to coordinate the flows on the DFM nodes. The DFM nodes then run the flows, so anything in those flows is strictly from the context of the DFM. In other words, the Input Directory in GetFile will always refer to the local filesystem on the 'slave' nodes. You may want to consider using a file share mounted on both nodes for this.

One other thing to note is that if you need a single point of ingest, you can use GetFile scheduled to only run on the primary node. This will however mean that all processing only happens on one node, which is probably undesirable. A better model for the problem is to use ListFile on a primary node, against a shared directory location, then use site-to-site back to the same cluster to load-balance a FetchFile processor which continues the flow, hydrating the flowfiles with the content of the file from a shared spooling directory, and doing whatever other processing is required.

Re: Fetch file from the NCM of a NIFI cluster

New Contributor

Thanks @Simon Elliston Ball

for the input. I have a couple of queries regarding the second model(using listfile+fetch file)that you suggested:

  1. How the load balancing is done while the file is pulled by the Fetch File processor? Is it done by automatically or it we have use some load balancer to do it?
  2. Both the list file and FetchFile processor will be created inside the GUI of the NCM since we only have access to the GUI of the NCM of the cluster?

Re: Fetch file from the NCM of a NIFI cluster

Master Guru
@INDRANIL ROY

You can think of the NCM as your command and control of all the connected Nodes in your cluster. The NCM itself does not process any data or run any processors in your dataflow. When you added the GetFile processor to the canvas via the NCM UI, the NCM's job was to make that request to add that processor to every Node. The GetFile processor when started is only running on the Nodes and will only check the local file system on each node in the directory configured for any files. Because of the functional responsibilities of the NCM, the NCM hardware requirements are much lighter then your nodes. The NCM will never write any data to the content, flowfile, or provenance repository like your nodes will so very little hard drive space is needed. Since it is not running any processors, the NCM's CPU needs are lees then your Nodes. The NCM will still need a good size heap for retaining components state reported to it from the nodes via heartbeats. Since the resource requirements are light for the NCM, it is not uncommon to see a node also installed on the same server as hosting the NCM. NiFi 1.0 (HDF 2.0) will introduce a new framework with many new enhancements. One of these changes is zero master clustering which eliminates the need for a NCM.

Thanks,

Matt

Re: Fetch file from the NCM of a NIFI cluster

New Contributor

Thanks for the info