Member since
07-30-2019
3392
Posts
1618
Kudos Received
1001
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 417 | 11-05-2025 11:01 AM | |
| 308 | 11-05-2025 08:01 AM | |
| 447 | 11-04-2025 10:16 AM | |
| 666 | 10-20-2025 06:29 AM | |
| 806 | 10-10-2025 08:03 AM |
07-07-2025
05:54 AM
@MK77 First lets clarify the Zookeeper (ZK) elected roles in Apache NiFi. Primary: ZK elects one node in the cluster as the "Primary" node. Processor components on the canvas configured to with Execution=Primary node will only get scheduled on that elected primary node. No other nodes will schedule these processors to execute. Cluster Coordinator: ZK elects one of the nodes as the cluster coordinator. Other nodes learn which node is the elected cluster coordinator from ZK. All nodes will send node heartbeats to the cluster coordinator to form the cluster. Any node in the NiFi cluster can be assigned either or both of these roles. There is no guarantee that the same node(s) will always be assigned these roles. Even after NiFi cluster is formed and roles are assigned, which nodes are assigned these roles can change. The flow.json.gz contain the dataflows on the canvas that are loaded on startup. The flow.xml.gz is only loaded if the flow.json.gz is missing. If NiFi loads the dataflow from the flow.xml.gz, it will generate a flow.json.gz from that flow.xml.gz. Now on to your problem.... Neither of the log lines you shared point to any problem: Invalid State Cannot replicate request to Node <node-hostname:port> because the node is not connected This log line simply tells you that this node can't replicate a request to anothetr node yet because it has not has not connected yet to the cluster. o.a.n.w.a.c.IllegalClusterStateExceptionMapper org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: The Flow Controller is initializing the Data Flow.. Returning Conflict response. This simply tells you that the flow.json.gz is still being initialized (loaded). This process needs to complete before the node finishes startup and can join the cluster. Depending on which Apache NiFi version you are running and the size of yoru dataflow, this can take some time to complete. What is the complete version of NiFi you are using? Without your full logs it is not possible from what has been shared to tell you what is going on or even if there really is any corruption with your flow.json.gz. One thing you can do is configure yoru NiFi to startup with all components on yoru canvas stopped instead of their last known state. This can be helpful if you have added a recent new dataflow that is perhaps causing issues initializing at startup. This achieved by changing the following setting in the nifi,properties file. Save a backup of your flow.json.gz before starting after changing this setting. The saved flow.json.gz will have the original saves state (Running, Stopped, Disabled) of all the components. nifi.flowcontroller.autoResumeState=false If your NiFi cluster starts fine after making this change, you can restart your dataflows to see if any are having issues. Beyond the above suggestion, there is not enough information shared to suggest anything else. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
07-03-2025
06:36 AM
@HoangNguyen There isn't an existing processor included with Apache NiFi capable of performing an UNION ALL against the contents of multiple FlowFiles. The JoinEnrichment is the only processor that can modify the contents of one FlowFile using the contents of another, but that only handles two FlowFiles (original FlowFile and enrichment FlowFile) in a single execution. The other record orientated processor all perform actions against an individual record in a FlowFile. You may need to develop your own custom processor for such a task. Something like the MergeRecord processor that bins like FlowFiles and then performs a UNION ALL on those binned FlowFiles. You could also raise a Jira in Apache NiFi (https://issues.apache.org/jira/browse/NIFI) asking for a processor that can perform such an operation and maybe someone would attempt to build it if their us enough Apache Community interest. You could also explore what Cloudera offers to its customers in terms of professional services that could help with building custom processors for Cloudera Flow Management offerings based off Apache NiFi. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
07-03-2025
06:14 AM
@Rohit1997jio https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html Your quartz cron "0-30 */6 * * * ?" translates to: Execute every second from 0 - 30 seconds 6 minutes after every minute of every hour ... I think your issue is using */6 because you are saying 6 minutes after every minute which is effectively the same thing as having just * in the minutes field. If you change this to 0/6, the processor would get scheduled 0/6/12/18/24/30/36/42/48/54 every hour. If you want it to start at 6 minutes you would use 6/6 which would schedule processor at 6/12/18/24/30/36/42/48/54 every hour (you would however have a 12 minute gap between end of each hour and 6 minutes of next hour with this config). Also keep in mind that Scheduling does not necessarily mean execution at same time. NiFi has a Max timer driven thread pool from which threads are given out to scheduled processors. With very large flows or processors with long running threads, scheduled processor may need to wait for a thread to become available to actually execute. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
07-03-2025
05:56 AM
@NifiEnjoyer Welcome to the community. As this thread is related to the deprecation of NiFi templates in Apache NiFi 2 and an old thread, it would be better to start a new community question with your query about downloading and uploading flow definitions. You'll want to include yoru source and destination Apache NiFi versions in your question details. Fell free to @MattWho in your new community question. Thank you, Matt
... View more
07-02-2025
08:31 AM
@HoangNguyen All the ForkEnrichment processor does is add two specific FlowFile Attributes to each FlowFile it outputs: The JoinEnrichment processor depends on receiving two FlowFiles with Matching "enrichment.group.ids" and one with "enrichment.role" = ORIGINAL and other FlowFile with "enrichment.id" = ENRICHMENT. So you can do something like this for example: In the above you above you fork the staring FlowFile and then join that first Enrichment, then you use ForkEnrichment again to generate the needed FlowFile attributes for the second Join enrichment operation. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
07-02-2025
05:26 AM
@Rohit1997jio The content of a NiFi FlowFile does not live in NiFi heap memory space. Only the FlowFile Metadata/Attributes are held in NiFi heap memory. Even then there are thresholds per connection in which swap files would be created to reduce that heap usage. Some Processors may need to load content into heap memory when they execute against a FlowFile(s). Before making recommendations on your ConsumeKafkaRecord processor configuration, more information about your NiFi and Kafka topic are needed. Are you running a multi-node NiFi cluster or a Single instance of NiFi? If a cluster, how many nodes make up yoru NiFi cluster? How many partitions are setup on the target kafka topic? Kafka partitions are assigned by Kafka to different consumers in consumer group. So lets say you have 10 partitions on your kafka topic, 1 NiFi instance, and a consumeKafka configured with 1 concurrent task. all 10 of these partitions would be assigned to that one consumer. When the ConsumeKafkaRecord executes, it will consume from one of those partitions, next execution from the next partition, and so on. This is likely why you are not seeing all the kafka messages consumed when you schedule the processor to execute only once every 4 hours. Even if you were to set concurrent tasks to 10 on the ConsumeKafkaRecord processor, the scheduler is only going to allow one execution every 4 hours. So in this case you would be best suited to set 10 concurrent tasks and adjust your Quartz Cron scheduler so it schedules every second for 10 seconds every 4 hours. Also keep in mind the "Max Poll Records" setting as in controls max records(messages) to add to single record FlowFile created during each execution. If you have a lot of records you may consider increasing how many times it get scheduled every 4 hours to maybe 30 seconds to make sure you get all messages form every partition. Now assuming you have a multi-node NiFi cluster with 5 nodes for example, your consumeKafkaRecord processor is configured with a group.id, and 10 partitions. You would set concurrent tasks to 2 (2 consumers X 5 nodes = 10 consumers in the consumer group). Kafka will assign one partition to each of these 10 consumers in the consumer group. Hope this helps you configure your ConsumeKafkaRecord processor so you can be successful with your requirement. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
07-01-2025
05:50 AM
@Bhar Can you share more detail? Without it, I would only be making random guesses. What version of Apache NiFi are using? Is this a single instance of NiFi or a NiFi multi-node cluster? How is your MergeContent processor configured? Thank you, Matt
... View more
07-01-2025
05:44 AM
@HoangNguyen Welcome to the community. It would be very difficult to provide any suggestions with the limited information you have shared. Please share more detail about your use case and what you are trying to accomplish. The JoinEnrichment processor is used in conjunction with the ForkEnrichment processor. For a JoinEnrichment processor to join two NiFi FlowFiles, those two FlowFiles must both have a matching group id set in an "enrichment.group.id" attribute on each FlowFile and must also have an Attribute" enrichment.role" set appropriately on each FlowFile (ORIGINAL set on FlowFile to be enriched and ENRICHMENT set on the FlowFile containing the enrichment data). Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-23-2025
09:04 AM
@melek6199 What you have is an authorization issue. When you access you multi-node NiFi cluster, you are authorized only into the node in which you authenticated. When you make a request like List Queue or Empty Queue, you are making a request from one node to all the other nodes to list or empty the connection queue. This means that the nodes themselves need to be authorized to request other nodes to share back their queue list or empty their target node queues. All 4 of your NiFi nodes should already have been authorized for "proxy user requests", but in order to list or empty queues, your node will need these additional authorizations: "view the data" - authorizes a node(s) to list the data from other nodes (user must also be authorized) "modify the data" - authorizes a node(s) to empty a connection queue on other nodes. You can see from the nfi-user.log output you shared the identity and policy missing to perform this action on the specific connection UUID: Node x.x.x.x:8443 is unable to fulfill this request due to: Unable to modify the data for Processor with ID d3a802c6-0196-1000-ffff-ffff90fdc7b8 You would have seen this same exception for all but one node when you made the request to empty the queue. Authorizations are inherited form parent Process groups unless explicitly set on the individual component directly. So you don't need to authorize your nodes for "view the data" and "modify the data" on the connection "d3a802c6-0196-1000-ffff-ffff90fdc7b8" directly, but rather set these authorization instead on the parent process group. Keep in mind that child process groups also inherit from parent process groups unless policy is explicitly set on that child process group.. Typically you would set these authorization policies on the root process group (top level). You'll also notice when you are viewing policies on a component it will tell you if it is inheriting policies and if you choose to set explicit policies on that component it asks you if you want to copy the inherited policy before modifying. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-12-2025
09:19 AM
Hello @Bdeyyam Cloudera Manager cumulative Hotfix release information can be found in the Cloudera documentation: Cumulative hotfixes From the rpm version shared above I can see those are from Cloudera Manager 7.11.3 Cumulative hotfix 4 Hope this help you, Matt
... View more