Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can not stop processor in cluster when a node down/disconnect

avatar
Contributor

I have a cluster Nifi with 3 nodes: node-1, node-2, node-3. When I run a job on cluster, there is some errors and node-2 disconnected to cluster. Then I want to go UI admin of node-1 or node-3 to stop this job. But I can not stop it.

It notices:
Cluster is unable to service request to change flow: Node node-2:8092 is currently disconnected

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Kiem Nguyen

In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can't have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected.

Your two options are:

1. Reconnect the disconnected node and then stop your dataflow(s).

2. Drop the disconnected node form your cluster via the "cluster" UI found in the hamburger menu in the upper right corner of the UI. This will make your cluster a 2 of 2 cluster and will return UI to full functionality. You will need to then restart that dropped node in order to get it to try to join the cluster again once fixed.

Thanks,

Matt

View solution in original post

3 REPLIES 3

avatar
Super Mentor

@Kiem Nguyen

In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can't have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected.

Your two options are:

1. Reconnect the disconnected node and then stop your dataflow(s).

2. Drop the disconnected node form your cluster via the "cluster" UI found in the hamburger menu in the upper right corner of the UI. This will make your cluster a 2 of 2 cluster and will return UI to full functionality. You will need to then restart that dropped node in order to get it to try to join the cluster again once fixed.

Thanks,

Matt

avatar
Contributor
@Matt Clarke

Thanks your reply, I did follow the second option. But I had to remove data content on the disconnected node before restarting it.

And I found that the node disconnected because of overload queue when executing job.
I confuse that if we can configure queue size up to contain more data. How can we do this?

Please help me if you have solutions for these problems. (overload queue).

Thanks,

Kiem

avatar
Super Mentor

@Kiem Nguyen

I highly recommend starting a new question in Hortonworks community connection for this. Diagnosing what caused your node to disconnect and how to resolve is a different topic from how to stop a processor with a disconnected node.

It would also be helpful to explain what you mean by "overloaded queue" and what makes you feel the size of your queue triggered your node to disconnect. What error did you see in the nifi-app.log on the node that disconnected.

Thanks,

Matt