About MattWho

MattWho · ‎11-08-2019

@ChampagneM12 When you install a NiFi cluster, you start with a blank canvas. So there is no data ingestion at first. The user must construct data flow(s) to meet their individual use cases as I am sure you know. Handling data ingestion through an outage is handled through your implementation. Lets assume you are ingesting from Kafka in to NiFi since you mentioned you use Kafka. You would likely start that dataflow with a ConsumeKafka processor. Le's also assume you have a 3 node NiFi cluster and the Kafka topic you are consuming from has 12 partitions. Since all nodes in your cluster will be executing the consumeKafka processor, each will be a consumer of that topic. With a single concurrent task (default) configured on the ConsumeKafka, each of those 3 NiFi node's consumeKafka will be assigned 4 partitions each. If you were to set the Concurrent tasks to 4, then you now have a total of 12 consumers (one for each Kafka partition). Now lets assume one of your NiFi nodes goes down, Kafka will see a drop in the number of consumers from 12 to 8 and rebalance. So consumption will continue with some of those consumers now being assigned multiple partitions until the down NiFi node comes back on line. That is just one scenario. In the case of using a NiFI Listen type processor (example: ListenTCP). This starts a TCP socket listener on each node in the NiFi cluster on the same port. In this case it would be the client or some external mechanism that would need to handle failover to a different node in the event a NiFi node goes down. This is typically handled with an external load balancer which distributes data to all the NiFi nodes or switches to a different node when a node goes down. In the use case of something like ListSFTP, this processor would be configured to run on "primary node" only. Zookeeper is responsible for electing a primary node and a cluster coordinator in a NiFi cluster. NiFi processor components like ListSFTP are designed for primary node execution only and store state on the data listed in cluster state (within zookeeper). If the current elected primary node goes down, another node in the NiFi cluster is elected the new primary node and the primary node only configured processors are started non that new node. Last recorded state for that component reported to ZK by the previous primary node is pulled from ZK to the new primary node processor and it picks up listing from there. Again you have redundancy. The only place in NiFi were you can have data delay, is when a NiFi node goes down while it still has active data in its connection queue(s). Other nodes will not have access to that data on the other down node to take over work on it. It will remain in that node's flowfile and content repositories until that node has been restored and can continue processing on that queued FlowFiles. So it is important to protect those two NiFi repositories using RAID configured drives. You can minimize impact in such cases through good flow design and use of back pressure to limit amount of FlowFiles that can queue on a NiFi node. Also keep in mind that while the flowfie and content repositories are tightly coupled to the flow.xml.gz, these items are not tightly coupled to a specific piece of hardware. You can stand up an entirely new node for you cluster and move the flow.xml.gz, content repo and flowfile repo on to that node before starting it and that new node will continue processing the queued FlowFiles. Hope this helps, Matt

MattWho · ‎11-07-2019

@ChampagneM12 Running multiple NiFi nodes within the same NiFi cluster on the same system is not recommended, but can be done. This is possible by editing the nifi.properties file for each NiFi node so that is binds to its own unique HTTP ports for the following settings: nifi.remote.input.socket.port= nifi.web.http(s).port= nifi.cluster.node.protocol.port= nifi.cluster.node.load.balance.port= On startup NiFi will bind to these ports and multiple nodes on the same server can not bind to the same port. Also keep in mind that multple NiFi instances can NOT share resources like any of the repositories (database, flowfile, content, or provenance), local state directories, etc. so make sure those are all set to unique paths per node in the nifi configuration files (nifi.properties, state-management.xml, authorizers.xml) This will allow you to have multiple nodes loaded on the same server in same NiFi cluster. You will however potentially run in to issues when you start building your dataflows... Each instance will run its own copy of the data flows you construct. So any processor or controller service you add that sets up a listener will not work as only one node in your cluster will successfully bind to the configured port (there is not workaround for this). So total success here is going to in part depend on what kind of data flows you will be building. Hope this helps, Matt

MattWho · ‎11-06-2019

@LuxIsterica "filename" is also another FlowFile attribute that is created by default on every FlowFile that is created in NiFi. With some processors a filename can not be derived from or created based in the content that is received. ExecuteSQL (no inbound connection) and generateFlowFile processors are good examples here. In case like this, NiFi will just default to using the FlowFile's uuid as the filename also. Your statement "attribute "filename" that generated that executesql is "inherited" in all processors" is not accurate. Processors do not inherit attributes. A NiFi FlowFile exists of two parts: 1. FlowFile attributes/metadata -- These FlowFile attributes reside in heap memory and are also stored in the flowfile_repository. It is these attributes which "flow" from one processor component to another in you dataflow you build on the canvas. Processors then have access to these FlowFile Attributes when they execute against a given FlowFile from the inbound connection. Some processors as part of their execution will create additional attributes on a FlowFile before it is committed to the processor relationship that is assigned to a outbound connection. 2. FlowFile Content -- The actual content of a FlowFile is written to a claim in the content_repository. It is only access as needed by a processor. It does not reside in heap memory unless a processor needs to do so to perform its function. These FlowFile attributes can be changed as your FlowFile passes through different processors, but they belong to the FlowFile and not the processors at all. So there is nothing you need to "preserve/save" in most cases. Hope this adds some clarity, Matt

MattWho · ‎11-06-2019

@LuxIsterica The unique uuid assigned to a NiFi processor component is not exposed to NiFi Expression Language (EL). So it is not something you can accomplish dynamically via the NiFi EL. What I a confused by is your screenshot. Your screenshot does not show the UUID of the processor. It is showing the unique UUID assigned to a FlowFile in position 1 on a connection. If what you are really looking for is the FlowFile UUID and not the processor component UUID, then that can be access via the NiFI EL. The uuid is assigned by default to an attribute "uuid" (all lowercase) on every FlowFile that is created in NiFi. It can be accessed using ${uuid} in NiFi EL. If you wanted to preserve that uuid into another FlowFile Attribute, you could use UpdateAttribute property: Value: id ${uuid} Thanks, Matt

MattWho · ‎11-06-2019

@Cl0ck Please start a new community post for your new question. Thank you, Matt

MattWho · ‎11-05-2019

@Paul Yang I was in no way implying that you should have removed your NiFi nodes DN as a user identity in the NiFi-Registry. The DN for every NiFi node must exist in NiFi-Registry and have been granted both proxy and read on "Can Manage Buckets" policies. NiFi nodes will regularly read the buckets in the NiFi-Registry to see if a newer version of your Version controlled PG exists (this is why read on "Can manage buckets" is needed.). The "?" is displayed when the NiFi nodes can not read the bucket. When a user in NiFi performs a version control action, the node will proxy the request n behalf of that user to the NiFi-Registry. This is why all NiFi nodes must exist as users in NiFi-Registry and have the proxy policy granted to them. Only your initial admin user should have all policies except proxy. That user should never be proxying anything. Thanks, Matt

MattWho · ‎11-05-2019

@Cl0ck Anytime the NiFi process fails to start or shuts back down the reason should be output in either the nifi-bootsrap.log (if startup failed during bootstrap) or the nifi-app.log (shutdown because of some exception during loading of the main NiFi child process). Start my looking at these logs for what the issue may be. NiFi will fail to start if the service is already running. so execute ps -ef|grep nifi to see if there may already be a some NiFi process still running. There should have been no need to remove NiFi before going back and installing those additional services. Having multiple services running on the same host should not be an issue as CFM has each of these services by default starting up on different ports. Running other services on the same host as NiFi is not recommended due to resource contention. But ok for just testing or playing around. Hope this helps, Matt

MattWho · ‎11-04-2019

@pxm NiFi sets not restriction on the data size that can be processed. Ingested data becomes the content portion of a NiFi FlowFile and is written to the content repository. The data is not read again unless a processor needs to read the content; otherwise, only the FlowFile attributes/metadata is passed from processor component to another component. So you need to make sure you have sufficient storage space for the NIFi content_repository. It is also strongly recommended that this dedicated storage separate from any other NiFi repository. Beyond that, any limitation here will be on network and disk IO. Thanks, Matt

MattWho · ‎11-04-2019

@Paul Yang A couple observation based on provided information: 1. You have need clientAuth enabled nifi.registry.security.needClientAuth=true With this enabled in NiFi-Registry the only authentication method supported will be 2-way TLS. If you want to support other authentication methods lik Spnego, LDAP, and/or kerberos, this property must be false. 2. Your users.xml (used for user authorization and not authentication) only contains one user. I am assuming the user is : CN=arch-fndtf04.beta1.fn, OU=NIFI And your authorizations.xml also show that this is the only user that has been authorized to a bunch of policies. Your ldap user you want to authorize must exist in the users.xml file. My guess here is that you did not set the "Initial admin" user in your authorizers.xml (which you did not share). There are two possible providers in the authorizers were you can set the initial admin (file-user-group-provider - Only set here if your initial admin user is NOT coming from the ldap-user-group-provider. file-access-provider - initial admin must be set here so initial admin policies are created for this user. 3. The users.xml and authorizations.xml files are only generated if they do not already exist. If you go back and add an initial admin to your authorizers.xml, you will need to delete/rename the existing users.xml and authorizations.xml files o new ones can be generated in startup. 4. Since need clientAuth was set to true and UI clearly shows that user string: CN=arch-fndtf04.beta1.fn, OU=NIFI successfully authenticated, your browser must have this client certificate loaded and presented to the server when you navigated to the URL for your NiFi-Registry. I really see no reason why this certificate was loaded int to your browser. NiFi-Registry, even when need clientAuth is false, will always try mutual TLS authentication first (becomes a WANT instead of REQUIRE when need clientAuth is false). If no client certificate is provided in TLS handshake, next auth method tried is Spnego (if configured), and finally a configured login provider from the identity-providers.xml. Hope this info helps with correcting your setup issues. Thanks, Matt

MattWho · ‎11-01-2019

@Paul Yang NiFi stores information about the nodes that were connected to the cluster in its local state directory on each node. Prior to securing your NiFi cluster, all your nodes were running unsecure on port 8080 and that information was retained in local state. After securing your NiFi, you node were now identified using the same hostnames, but different secure port. So Now your cluster is expecting to see 8080 and new secure port nodes in the cluster. The easiest way to resolve this issue is to simply stop your NiFi nodes, remove the NiFi local state directory contents on each of your node, and then start your nodes. Now if this is not a new installation and you are concerned about losing locally stored NiFi component state (like listFile stores state in what was already listed), then you can get around this issue by changing the configured cluster protocol port (found in the nifi.properties file) to a different unused port on your servers. You will then be able to access the UI were you will see your new nodes plus the unexpected ones on the old ports which you can manually remove from the "cluster" UI accessible from the NiFi Global menu in the upper right corner. This is a known issue which is being addressed. But it only happens when you change ports (which commonly only happens when you are switching from unsecure to secure. Hope this helps, Matt

Online	Offline
Last Visited	‎05-11-2026 09:33 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎05-11-2026 09:33 PM
Posts	3,467
Kudos received	1637

Cloudera Community

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: FetchSMB not fetching all files

Re: Nifi: How to revoke the import and export Temp...

Re: Setting TTL per key when writing to redis

Re: Nifi Cluster

Re: Nifi Cluster

Re: Store uuid in a another attribute

Re: Store uuid in a another attribute

Re: NiFi Node going down after CFM installation. B...

Re: Cannot got login page when i enable ssl & lda...

Re: NiFi Node going down after CFM installation. B...

Re: Ingest Large Data Files

Re: Cannot got login page when i enable ssl & lda...

Re: Meet Failed to replicate request GET /nifi-api...