Member since
07-30-2019
3391
Posts
1618
Kudos Received
1000
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 251 | 11-05-2025 11:01 AM | |
| 157 | 11-05-2025 08:01 AM | |
| 491 | 10-20-2025 06:29 AM | |
| 631 | 10-10-2025 08:03 AM | |
| 402 | 10-08-2025 10:52 AM |
02-02-2024
01:32 PM
1 Kudo
@Kiranq all components (processors, controller services, reporting tasks, etc) execute as teh user that owns the running NiFi service. So that NiFi service user would need to be able to execute the local system command. The user who authenticates themselves to access the NiFi UI is not the user used to run the components. Have you tried as the NiFi service user to execute your python code from command line? If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-01-2024
07:09 AM
1 Kudo
@PriyankaMondal I am not clear by you statement: if Nifi processor (any processor within a process group) stops suddenly due to load/any other issue You are saying you see a NiFi processor transition to a stopped state unexpectedly? This should never happen. Or are you saying the processor seems to dtop processing FlowFiles even though it is currently in a running/started state? NiFi queues FlowFiles in connection between processor components. A FlowFile is not removed from the inbound connection to a processor component until that FlowFile has been successfully processed by the consuming processor. The FlowFile consist of two parts: 1. FlowFile attributes/metadata that is persisted in the NiFi flowfile_repository. 2. FlowFile content persisted within claims inside the content_repository. To protect from data loss these repositories should be using protected storage such as RAID. So if NiFi were to suddenly crash or server itself crash, when NiFi is restarted on that down node it will load its flow and then load the FlowFile back in to the connections. Processing will begin again against those FlowFiles by downstream processor component. NiFi's design favors data duplication over data loss ir order to avoid data loss posibilities. For example: Let's assume that a NiFi processor completed execution against a FlowFile resulting in writing something out to an external endpoint. in response to that successful operation the processor would then move the FlowFile from the inbound connection to some a downstream relationship. If NiFi were to crash in that very moment before the FlowFile was moved, on startup the same FlowFile would load in the inbound connection and get processed again. Also keep in mind that you are running 3 node NiFi cluster and within a NiFi cluster each connected node runs its own copy of the flow, its own set of repositories, and its own local state. So each node is unaware of the FlowFiles being processed by another node in the same cluster. Generally speaking when you have a processor that shows active threads indicator on it and zeroed out stats, you either have a very long running thread or a hung thread (only examination of serious of thread dumps can make the determination. Most commonly this is a resource utilization problem. But could also be dataflow design issue, client library issue, or network issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
02-01-2024
06:44 AM
@Sartha I don't know how to respond to "I followed the flow as per your guidance but still it doesn't worked." The flow I provided works. What exceptions/errors are you encountering? You have not provided much detail on what you are seeing exception/issue wise. Thank you, Matt
... View more
02-01-2024
06:38 AM
@ALWOSABY The putHDFS processor has properties for changing the Remote Owner and Remote Group, but in order to use these properties certain condition must be met: If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
01-31-2024
02:21 AM
1 Kudo
Upon examination, implementing the queue as a First In First Out prioritizer and configuring the load balancing strategy to Partition by attribute with the kafka.partition attribute has proven effective in maintaining the order.
... View more
01-30-2024
06:02 AM
@FrankHaha I am a little confused on your ask due to the terminology used. A NiFi template (deprecated and removed in NiFi 2.x) is a reusable NiFi dataflow snippet (collection of interconnected components and controller services in XML format). Templates have been replaced by "flow definitions" (similar to templates but in json format). You can't execute a template or a flow definition. You can deploy a template or flow definition to the canvas of an installed and running NiFi instance. Anything you can do via the NiFi UI, you can also accomplish via NiFi rest-api calls. The easiest way to learn what rest-api calls are needed and the format of each of those rest-api calls is through the use of yoru browsers built-in developer tools. You can perform each action via the UI and "capture as curl" through the browser developers tools "network" tab the rest-api call that was made. This includes importing a flow definition or template, modifying components imported, enabling, starting, stopping, etc) You can put those calls into a script to perform those same commands later without using the UI. Another option might be through the use of the NiFi CLI toolkit which offers a variety of commands for doing similar functions as the rest-api calls. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
01-29-2024
05:58 AM
@manishg Same about of flowFiles per second processing after switching to the Volatile repositories? Perhaps having FlowFile and provenance repositories in memory allows for faster processing of FlowFIles resulting in more read and writes to the content_repository which contains the actual content of each FlowFile. If your NiFi should crash or restart you will lose everything in your volatile repositories. The FlowFile repository holds all the FlowFile metadata for the FlowFiles currently being processed through your dataflows. This means data loss in such events. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
01-29-2024
05:44 AM
@PriyankaMondal Just to add to What @ckumar provided, the NiFi repositories are not locked to the specific node. What i mean by that is that they can be moved to a new node, withe "new" being the key word there. A typical prod NiFi setup will use protected storage for its flowfile_repository and content_repository(s) which hold all the FlowFile metadata and FlowFile content for all actively queued and archived FlowFiles on a node. To prevent loss of data, these repositories should be protected through the use of RAID storage or some other equivalent protected storage. The data stored in these repositories is tightly coupled to the flow.xml.gz/flow.json.gz that cluster is running on every node. Let's say you have hardware failure, it may be faster to standup a new server then repair the existing hardware failure. You can simple move or copy the protected repositories to the new node before starting it. When the node starts and joins your existing cluster it will inherit the cluster's flow.xml.gz/flow.json.gz and then begin loading the FlowFiles from those moved repositories in to the connection queues. Processing will continue exactly where it left off on the old node. There is no way to merge repositories together, so you can not add the contents of one nodes repositories to the already existing repositories of another node. The provenance_repository holds lineage data, and the database_repository holds flow configuration history and some node specific info. Neither of these are needed to preserve the actual FlowFiles. Hope this helps, Matt
... View more
01-22-2024
08:42 PM
@Dave0x1, I'm happy to see that you resolved your issue. Can you kindly mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?
... View more
01-18-2024
11:05 AM
@Adhitya This is a rather old post. Can your provide details on your specific setup (processors and configurations including scheduling) used, info around your data and what you expect versus what you are seeing? Is your NiFi a cluster setup or standalone? How is source data ingested into your NiFi for this dataflow? Typically issues like this are related to dataflow design, but there is not enough info here to reproduce or make suggestions yet. Thanks, Matt
... View more