About MattWho

MattWho · ‎07-07-2025

@MK77 First lets clarify the Zookeeper (ZK) elected roles in Apache NiFi. Primary: ZK elects one node in the cluster as the "Primary" node. Processor components on the canvas configured to with Execution=Primary node will only get scheduled on that elected primary node. No other nodes will schedule these processors to execute. Cluster Coordinator: ZK elects one of the nodes as the cluster coordinator. Other nodes learn which node is the elected cluster coordinator from ZK. All nodes will send node heartbeats to the cluster coordinator to form the cluster. Any node in the NiFi cluster can be assigned either or both of these roles. There is no guarantee that the same node(s) will always be assigned these roles. Even after NiFi cluster is formed and roles are assigned, which nodes are assigned these roles can change. The flow.json.gz contain the dataflows on the canvas that are loaded on startup. The flow.xml.gz is only loaded if the flow.json.gz is missing. If NiFi loads the dataflow from the flow.xml.gz, it will generate a flow.json.gz from that flow.xml.gz. Now on to your problem.... Neither of the log lines you shared point to any problem: Invalid State Cannot replicate request to Node <node-hostname:port> because the node is not connected This log line simply tells you that this node can't replicate a request to anothetr node yet because it has not has not connected yet to the cluster. o.a.n.w.a.c.IllegalClusterStateExceptionMapper org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: The Flow Controller is initializing the Data Flow.. Returning Conflict response. This simply tells you that the flow.json.gz is still being initialized (loaded). This process needs to complete before the node finishes startup and can join the cluster. Depending on which Apache NiFi version you are running and the size of yoru dataflow, this can take some time to complete. What is the complete version of NiFi you are using? Without your full logs it is not possible from what has been shared to tell you what is going on or even if there really is any corruption with your flow.json.gz. One thing you can do is configure yoru NiFi to startup with all components on yoru canvas stopped instead of their last known state. This can be helpful if you have added a recent new dataflow that is perhaps causing issues initializing at startup. This achieved by changing the following setting in the nifi,properties file. Save a backup of your flow.json.gz before starting after changing this setting. The saved flow.json.gz will have the original saves state (Running, Stopped, Disabled) of all the components. nifi.flowcontroller.autoResumeState=false If your NiFi cluster starts fine after making this change, you can restart your dataflows to see if any are having issues. Beyond the above suggestion, there is not enough information shared to suggest anything else. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

NifiEnjoyer · ‎07-04-2025

It didn't take long, so I'll write it here. All you need is just start add process group, then click the browse button, select the json file and that's it. I hope this will be useful to someone

MattWho · ‎07-03-2025

@HoangNguyen There isn't an existing processor included with Apache NiFi capable of performing an UNION ALL against the contents of multiple FlowFiles. The JoinEnrichment is the only processor that can modify the contents of one FlowFile using the contents of another, but that only handles two FlowFiles (original FlowFile and enrichment FlowFile) in a single execution. The other record orientated processor all perform actions against an individual record in a FlowFile. You may need to develop your own custom processor for such a task. Something like the MergeRecord processor that bins like FlowFiles and then performs a UNION ALL on those binned FlowFiles. You could also raise a Jira in Apache NiFi (https://issues.apache.org/jira/browse/NIFI) asking for a processor that can perform such an operation and maybe someone would attempt to build it if their us enough Apache Community interest. You could also explore what Cloudera offers to its customers in terms of professional services that could help with building custom processors for Cloudera Flow Management offerings based off Apache NiFi. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-03-2025

@Rohit1997jio https://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html Your quartz cron "0-30 */6 * * * ?" translates to: Execute every second from 0 - 30 seconds 6 minutes after every minute of every hour ... I think your issue is using */6 because you are saying 6 minutes after every minute which is effectively the same thing as having just * in the minutes field. If you change this to 0/6, the processor would get scheduled 0/6/12/18/24/30/36/42/48/54 every hour. If you want it to start at 6 minutes you would use 6/6 which would schedule processor at 6/12/18/24/30/36/42/48/54 every hour (you would however have a 12 minute gap between end of each hour and 6 minutes of next hour with this config). Also keep in mind that Scheduling does not necessarily mean execution at same time. NiFi has a Max timer driven thread pool from which threads are given out to scheduled processors. With very large flows or processors with long running threads, scheduled processor may need to wait for a thread to become available to actually execute. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-02-2025

@HoangNguyen All the ForkEnrichment processor does is add two specific FlowFile Attributes to each FlowFile it outputs: The JoinEnrichment processor depends on receiving two FlowFiles with Matching "enrichment.group.ids" and one with "enrichment.role" = ORIGINAL and other FlowFile with "enrichment.id" = ENRICHMENT. So you can do something like this for example: In the above you above you fork the staring FlowFile and then join that first Enrichment, then you use ForkEnrichment again to generate the needed FlowFile attributes for the second Join enrichment operation. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎07-01-2025

@Bhar Can you share more detail? Without it, I would only be making random guesses. What version of Apache NiFi are using? Is this a single instance of NiFi or a NiFi multi-node cluster? How is your MergeContent processor configured? Thank you, Matt

MattWho · ‎06-23-2025

@melek6199 What you have is an authorization issue. When you access you multi-node NiFi cluster, you are authorized only into the node in which you authenticated. When you make a request like List Queue or Empty Queue, you are making a request from one node to all the other nodes to list or empty the connection queue. This means that the nodes themselves need to be authorized to request other nodes to share back their queue list or empty their target node queues. All 4 of your NiFi nodes should already have been authorized for "proxy user requests", but in order to list or empty queues, your node will need these additional authorizations: "view the data" - authorizes a node(s) to list the data from other nodes (user must also be authorized) "modify the data" - authorizes a node(s) to empty a connection queue on other nodes. You can see from the nfi-user.log output you shared the identity and policy missing to perform this action on the specific connection UUID: Node x.x.x.x:8443 is unable to fulfill this request due to: Unable to modify the data for Processor with ID d3a802c6-0196-1000-ffff-ffff90fdc7b8 You would have seen this same exception for all but one node when you made the request to empty the queue. Authorizations are inherited form parent Process groups unless explicitly set on the individual component directly. So you don't need to authorize your nodes for "view the data" and "modify the data" on the connection "d3a802c6-0196-1000-ffff-ffff90fdc7b8" directly, but rather set these authorization instead on the parent process group. Keep in mind that child process groups also inherit from parent process groups unless policy is explicitly set on that child process group.. Typically you would set these authorization policies on the root process group (top level). You'll also notice when you are viewing policies on a component it will tell you if it is inheriting policies and if you choose to set explicit policies on that component it asks you if you want to copy the inherited policy before modifying. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Bdeyyam · ‎06-16-2025

Does anyone has an update on my request?

vats · ‎06-11-2025

Hello @Artem_Kuzin I looked into this issue and it appears to be a bug in CDP version 7.3.1, which has been resolved in version 7.3.2.0

MattWho · ‎06-10-2025

@agriff I did not know that you were using the Apache NiFi 2.x release. The component list I provided is from the Apache NiFi 1.x release. NiFi 2.x switched from having numerous client version Kafka based processors to single Kafka based processors that now use a KafkaConnectionService controller service component to define the kafka client version. In Apache NiFi the only connection service included is for theKafka 3 Client. The Kafka client 3 I understand to be backwards compatible to Kafka 2.6, but sounds like you are having success with using it for Kafka 2.5. Glad to hear you were able to resolve yoru underlying schema issue. Setting Bulletins level on a processor has absolutely nothing to do with log levels written to the nifi-app.log. It only controls what level bulletins are created within the NiFi UI. To change logging within the NiFi logs, you will need to modify the logback.xml configuration file found in the NiFi conf directory. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎10-23-2025 07:03 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎10-23-2025 07:03 AM
Posts	3,387
Kudos received	1613

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: 3 node cluster managed by 3 node zookeeper clu...

Re: create template is missing in Apache nifi-2.0....

Re: How to merge data from 3 Flowfile as a UNION s...

Re: Consume all records from kafka using ConsumeKa...

Re: How to Join 2 Processor JoinEnrichment

Re: Issue with NiFi MergeContent Only Outputting F...

Re: Unable to Clear Queue in NiFi 2.4.0 Cluster – ...

Re: Get Patch information about cloudera manager C...

Re: Ranger policies not applied to HDFS and Hive

Re: Nifi PublishKafka silent failures