About MattWho

MattWho · ‎02-05-2024

@hegdemahendra Curious about the differences between your prod and uat environments here. 1. Same number of nodes in each environments NiFi cluster? 2. Same exact configurations on the ConsumeKafka processor (except consumer group ID) ? 3. Screenshot of scheduling tab for consumeKafka processor? 4. Versions of NiFi (Apache, CFM, HDF) used in both environments? 5. Any observed rebalancing or consumer group related exceptions in the logs? Having more consumers in a consumer group than the number of partitions can lead to constant rebalance occurring. The number of consumers in the consumer group is calculated by multiplying the number of nodes in your NiFi cluster by the number of concurrent tasks configured on the ConsumeKafka processor. So if you have a 3 node cluster, you should have only 1 concurrent task, so number of consumers is equal to or less the the number of partitions. Thanks, Matt

MattWho · ‎02-02-2024

@Kiranq all components (processors, controller services, reporting tasks, etc) execute as teh user that owns the running NiFi service. So that NiFi service user would need to be able to execute the local system command. The user who authenticates themselves to access the NiFi UI is not the user used to run the components. Have you tried as the NiFi service user to execute your python code from command line? If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-02-2024

@oneofthemany You would get better traction form the community by starting a new question since this question already has an accepted solution and has nothing to do with NiFi TLS-toolkit usage. Thank you, Matt

MattWho · ‎02-01-2024

@PriyankaMondal I am not clear by you statement: if Nifi processor (any processor within a process group) stops suddenly due to load/any other issue You are saying you see a NiFi processor transition to a stopped state unexpectedly? This should never happen. Or are you saying the processor seems to dtop processing FlowFiles even though it is currently in a running/started state? NiFi queues FlowFiles in connection between processor components. A FlowFile is not removed from the inbound connection to a processor component until that FlowFile has been successfully processed by the consuming processor. The FlowFile consist of two parts: 1. FlowFile attributes/metadata that is persisted in the NiFi flowfile_repository. 2. FlowFile content persisted within claims inside the content_repository. To protect from data loss these repositories should be using protected storage such as RAID. So if NiFi were to suddenly crash or server itself crash, when NiFi is restarted on that down node it will load its flow and then load the FlowFile back in to the connections. Processing will begin again against those FlowFiles by downstream processor component. NiFi's design favors data duplication over data loss ir order to avoid data loss posibilities. For example: Let's assume that a NiFi processor completed execution against a FlowFile resulting in writing something out to an external endpoint. in response to that successful operation the processor would then move the FlowFile from the inbound connection to some a downstream relationship. If NiFi were to crash in that very moment before the FlowFile was moved, on startup the same FlowFile would load in the inbound connection and get processed again. Also keep in mind that you are running 3 node NiFi cluster and within a NiFi cluster each connected node runs its own copy of the flow, its own set of repositories, and its own local state. So each node is unaware of the FlowFiles being processed by another node in the same cluster. Generally speaking when you have a processor that shows active threads indicator on it and zeroed out stats, you either have a very long running thread or a hung thread (only examination of serious of thread dumps can make the determination. Most commonly this is a resource utilization problem. But could also be dataflow design issue, client library issue, or network issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎02-01-2024

@Sartha I don't know how to respond to "I followed the flow as per your guidance but still it doesn't worked." The flow I provided works. What exceptions/errors are you encountering? You have not provided much detail on what you are seeing exception/issue wise. Thank you, Matt

MattWho · ‎02-01-2024

@ALWOSABY The putHDFS processor has properties for changing the Remote Owner and Remote Group, but in order to use these properties certain condition must be met: If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-31-2024

@plapla The consumeKafka processor should not be reading the same message twice. The processor should be maintaining local state (since you are not clustered) in NiFi's local state directory. Make sure that you are not having disk space or permissions issues that may prevent the writing of that local state. You can write click on the ConsumeKafka processor to view the current stored state. The consumeKafka processor creates a consumer group using the GroupID configured in the processor, so make sure you do not have multiple consumeKafka processor consuming from the same Kafka Topic using the same Group ID. For optimal performance the number of concurrent tasks configured on the consumeKafka processor should match the number fo partitions on target topic. Do you see any Kafka rebalance going on? Will happen when you have more consumers than partitions in a the consumer group that is consuming from that topic. A rebalance can affect the commit of the offset resulting in possible data duplication. Thanks, Matt

MattWho · ‎01-30-2024

@FrankHaha I am a little confused on your ask due to the terminology used. A NiFi template (deprecated and removed in NiFi 2.x) is a reusable NiFi dataflow snippet (collection of interconnected components and controller services in XML format). Templates have been replaced by "flow definitions" (similar to templates but in json format). You can't execute a template or a flow definition. You can deploy a template or flow definition to the canvas of an installed and running NiFi instance. Anything you can do via the NiFi UI, you can also accomplish via NiFi rest-api calls. The easiest way to learn what rest-api calls are needed and the format of each of those rest-api calls is through the use of yoru browsers built-in developer tools. You can perform each action via the UI and "capture as curl" through the browser developers tools "network" tab the rest-api call that was made. This includes importing a flow definition or template, modifying components imported, enabling, starting, stopping, etc) You can put those calls into a script to perform those same commands later without using the UI. Another option might be through the use of the NiFi CLI toolkit which offers a variety of commands for doing similar functions as the rest-api calls. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-29-2024

@manishg Same about of flowFiles per second processing after switching to the Volatile repositories? Perhaps having FlowFile and provenance repositories in memory allows for faster processing of FlowFIles resulting in more read and writes to the content_repository which contains the actual content of each FlowFile. If your NiFi should crash or restart you will lose everything in your volatile repositories. The FlowFile repository holds all the FlowFile metadata for the FlowFiles currently being processed through your dataflows. This means data loss in such events. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-29-2024

@PriyankaMondal Just to add to What @ckumar provided, the NiFi repositories are not locked to the specific node. What i mean by that is that they can be moved to a new node, withe "new" being the key word there. A typical prod NiFi setup will use protected storage for its flowfile_repository and content_repository(s) which hold all the FlowFile metadata and FlowFile content for all actively queued and archived FlowFiles on a node. To prevent loss of data, these repositories should be protected through the use of RAID storage or some other equivalent protected storage. The data stored in these repositories is tightly coupled to the flow.xml.gz/flow.json.gz that cluster is running on every node. Let's say you have hardware failure, it may be faster to standup a new server then repair the existing hardware failure. You can simple move or copy the protected repositories to the new node before starting it. When the node starts and joins your existing cluster it will inherit the cluster's flow.xml.gz/flow.json.gz and then begin loading the FlowFiles from those moved repositories in to the connection queues. Processing will continue exactly where it left off on the old node. There is no way to merge repositories together, so you can not add the contents of one nodes repositories to the already existing repositories of another node. The provenance_repository holds lineage data, and the database_repository holds flow configuration history and some node specific info. Neither of these are needed to preserve the actual FlowFiles. Hope this helps, Matt

Online	Offline
Last Visited	‎01-13-2026 11:14 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-13-2026 11:14 AM
Posts	3,421
Kudos received	1620

Cloudera Community

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Nifi ConsumeKafka_2_6 (nifi 1.16.3) is reading...

Re: Permission Denied in Execute Stream Command Ni...

Re: Nifi registry issue with https

Re: Nifi Processor stops suddenly

Re: Unable to read remote file in nifi by using mi...

Re: how we can update owner of file from domain us...

Re: Nifi consume duplicate message problem from Ka...

Re: Command Line Arguments Run Template

Re: disk io operations going up with volatile repo...

Re: Data Replication on Nifi