Support Questions
Find answers, ask questions, and share your expertise

How to troubleshoot NiFi non-secure cluster.

I'm attempting to setup NiFi as a three node cluster (NCM on one host and a node each on two other hosts). I can see heartbeats in the all three logs but when I connect via url I receive "An unexpected error has occurred; No nodes were able to process this request." In logback.xml I've replaced every 'level="INFO"' with 'level="DEBUG"' but still only see a time out for an unspecified request in the NCM log. I've looked at both nifi-user.log and nifi-app.log on all three hosts.

I've gotten a single NiFi instance working on a different host. After this, I need to setup a five node kerberized NiFi cluster. But I'm stuck for now.

Anybody know how to troubleshoot further? Perhaps via API calls?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to troubleshoot NiFi non-secure cluster.

Contributor

Clustering is bidirectional in that there is a heartbeat mechanism as you are seeing across the protocol port you have configured and then the NCM replicating requests and communicates to the nodes via their REST API, configured across the same host and port that the UI for each would be available on. This information is relayed from each node to the manager as part of the joining process.

The primary properties beyond those for the clustering protocol are:

  • nifi.web.http.host
  • nifi.web.http.port

which would need to be accessible from the NCM.

With the request that results in the "... No nodes were able to process this request", there should additionally be a stacktrace on the NCM that outputs the address(es) that it is anticipating to be available. Verify connectivity to those sockets from your NCM. If nifi.web.http.host is not explicitly set, this will default to localhost which then be interpreted by the manager incorrectly when transmitted with the heartbeat.

Beyond that, if that does not turn up any additional paths, if you are able to share your NCM and one of your node's web and clustering properties it may help to debug a bit further.

View solution in original post

3 REPLIES 3

Re: How to troubleshoot NiFi non-secure cluster.

Contributor

Clustering is bidirectional in that there is a heartbeat mechanism as you are seeing across the protocol port you have configured and then the NCM replicating requests and communicates to the nodes via their REST API, configured across the same host and port that the UI for each would be available on. This information is relayed from each node to the manager as part of the joining process.

The primary properties beyond those for the clustering protocol are:

  • nifi.web.http.host
  • nifi.web.http.port

which would need to be accessible from the NCM.

With the request that results in the "... No nodes were able to process this request", there should additionally be a stacktrace on the NCM that outputs the address(es) that it is anticipating to be available. Verify connectivity to those sockets from your NCM. If nifi.web.http.host is not explicitly set, this will default to localhost which then be interpreted by the manager incorrectly when transmitted with the heartbeat.

Beyond that, if that does not turn up any additional paths, if you are able to share your NCM and one of your node's web and clustering properties it may help to debug a bit further.

View solution in original post

Re: How to troubleshoot NiFi non-secure cluster.

Master Guru

Just to add to what Aldrin said above. if you do not have the nifi.web.http.host populated in the nifi.properties on each of you nodes with either an IP or hostname that is reachable from the NCM, it is likely that your nodes are resolving that value to localhost. In this case the app log on the NCM would contain log lines showing localhost in the url string it is using for sending messages to the nodes. That of course will fail and the result would be the error you are indicating. Editing anything in the nifi.properties file will require a restart to take affect.

Re: How to troubleshoot NiFi non-secure cluster.

Thank you. This gave me enough to understand how to proceed. The nodes could see the NCM but not vice versa. It was indeed the "localhost" situation mentioned above.