Support Questions
Find answers, ask questions, and share your expertise

Can not stop processor in cluster when a node down/disconnect

I have a cluster Nifi with 3 nodes: node-1, node-2, node-3. When I run a job on cluster, there is some errors and node-2 disconnected to cluster. Then I want to go UI admin of node-1 or node-3 to stop this job. But I can not stop it.

It notices:
Cluster is unable to service request to change flow: Node node-2:8092 is currently disconnected

1 ACCEPTED SOLUTION

Accepted Solutions

Master Guru

@Kiem Nguyen

In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can't have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected.

Your two options are:

1. Reconnect the disconnected node and then stop your dataflow(s).

2. Drop the disconnected node form your cluster via the "cluster" UI found in the hamburger menu in the upper right corner of the UI. This will make your cluster a 2 of 2 cluster and will return UI to full functionality. You will need to then restart that dropped node in order to get it to try to join the cluster again once fixed.

Thanks,

Matt

View solution in original post

3 REPLIES 3

Master Guru

@Kiem Nguyen

In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can't have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected.

Your two options are:

1. Reconnect the disconnected node and then stop your dataflow(s).

2. Drop the disconnected node form your cluster via the "cluster" UI found in the hamburger menu in the upper right corner of the UI. This will make your cluster a 2 of 2 cluster and will return UI to full functionality. You will need to then restart that dropped node in order to get it to try to join the cluster again once fixed.

Thanks,

Matt

View solution in original post

@Matt Clarke

Thanks your reply, I did follow the second option. But I had to remove data content on the disconnected node before restarting it.

And I found that the node disconnected because of overload queue when executing job.
I confuse that if we can configure queue size up to contain more data. How can we do this?

Please help me if you have solutions for these problems. (overload queue).

Thanks,

Kiem

Master Guru

@Kiem Nguyen

I highly recommend starting a new question in Hortonworks community connection for this. Diagnosing what caused your node to disconnect and how to resolve is a different topic from how to stop a processor with a disconnected node.

It would also be helpful to explain what you mean by "overloaded queue" and what makes you feel the size of your queue triggered your node to disconnect. What error did you see in the nifi-app.log on the node that disconnected.

Thanks,

Matt