Support Questions

Find answers, ask questions, and share your expertise

can't connect to the NiFi cluster, after starting Nifi

avatar
Rising Star

In in nifi-app.log, file seeing this as only WARN level error:

WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.ConnectException: Connection refused

org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.ConnectException: Connection refused

at org.apache.nifi.cluster.protocol.impl.NodeProtocolSenderImpl.createSocket(NodeProtocolSenderImpl.java:145) ~[nifi-framework-cluster-protocol-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.cluster.protocol.impl.NodeProtocolSenderImpl.requestConnection(NodeProtocolSenderImpl.java:68) ~[nifi-framework-cluster-protocol-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.cluster.protocol.impl.NodeProtocolSenderListener.requestConnection(NodeProtocolSenderListener.java:93) ~[nifi-framework-cluster-protocol-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.controller.StandardFlowService.connect(StandardFlowService.java:671) [nifi-framework-core-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:418) [nifi-framework-core-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:774) [nifi-jetty-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.NiFi.<init>(NiFi.java:137) [nifi-runtime-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.NiFi.main(NiFi.java:227) [nifi-runtime-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

Caused by: java.net.ConnectException: Connection refused

at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_11]

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) ~[na:1.8.0_11]

at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[na:1.8.0_11]

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[na:1.8.0_11]

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_11]

at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_11]

at java.net.Socket.connect(Socket.java:538) ~[na:1.8.0_11]

at java.net.Socket.<init>(Socket.java:434) ~[na:1.8.0_11]

at java.net.Socket.<init>(Socket.java:211) ~[na:1.8.0_11]

at org.apache.nifi.io.socket.SocketUtils.createSocket(SocketUtils.java:59) ~[nifi-socket-utils-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

at org.apache.nifi.cluster.protocol.impl.NodeProtocolSenderImpl.createSocket(NodeProtocolSenderImpl.java:143) ~[nifi-framework-cluster-protocol-0.6.0.1.2.0.1-1.jar:0.6.0.1.2.0.1-1]

... 7 common frames omitted

Any idea or help would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar
Rising Star

Became ill, sorry for delay in replying.

Good news, have the NCM up and 3 slave nodes running.

Thank you Matt for your last update (and for responding to all of my questions I posted on this site), it helped me logic out the rest of the problems and the errors encountered after I found what you said to look for:" 2016-06-03 ... INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following ... " in the nifi-app.log file.....and I removed the flow.xml.gz file from the slave nodes. Those two tips gave me a fighting chance to figure this all out.

Lesson learned - making sure that the 'nifi' user is the owner of the directories and files and starting nifi logged in as the 'nifi' user, makes debugging messages in the 3 log files easier. Also - don't let someone try to start up NiFi logged in as root, before you get a chance to set up the configuration files for a clustered environment.

Thanks again Matt 🙂

PJ

View solution in original post

9 REPLIES 9

avatar

@PJ Moutrie

Have you verified that the firewall is open on the NiFi nodes?

avatar
Master Mentor

Are these https or http configured cluster NCM and Node(s)? NCM needs to be able to communicate with the http(s) port and node.protocol port configured in the nifi .properties file on the Node(s). Node needs to be able to communicate with the cluster manager protocol port configured in the nifi.properties file on the NCM. Thanks,

Matt

avatar
Rising Star

Hi to you both,

Thank you both for replying.

To reply to the first response: there are no firewalls between the nodes.

Hi Matt: These are all configured for http only. I have not defined any port numbers anywhere in the configuration for HTTPS related parameters, due to the fact that I was told not to secure this configuration.

Also, noticing now that flow.xml.gz seems to be empty on the Node where i am starting NiFi from...

-rw-r--r-- 1 nifi nifi 0 Jun 2 15:30 flow.xml.gz

in the past, this file looked like this:

-rw-r--r-- 1 nifi nifi 1112 Jun 2 11:25 flow.xml.gz

Could this be an factor as well?

After issuing the start command, I can access the DataFlow UI, but it doesn't have any indicator that it sees a clustered environment.

Thanks for any additional feedback.

avatar
Master Mentor

A fresh install of NiFi has no flow.xml.gz file until after it is started for the first time. Are these fresh NiFi installs or installations that were previously run standalone?

- if that is the case you can't simply tell them they are nodes and NCMs and expect it to work. Your NCM does not run with a flow.xml.gz like your nodes and standalone instances do. The NCM uses a flow.tar file. The flow.tar would be created on startup and contain an empty flow.xml. When you started your Node (with existing flow.xml.gz file) it would have communicated with NCM but been rejected because the flow on the node would not have matched what was on the NCM. If you are looking to migrate form a standalone instance to a cluster, I would suggest reading this: https://community.hortonworks.com/content/kbentry/9203/how-to-migrate-a-standalone-nifi-into-a-nifi-...

Let me make sure understand your environment:

1. you have two different installation of NiFi.

2. One installation of NiFi is setup and configured to be a non-secure (http) NCM

3. One instance of NiFi is setup and configured to be a non-secure (http) Node.

4. The # cluster common properties (cluster manager and nodes must have same values) # section in the nifi.properties files on both NCM and Node(s) are configured identical

5. In that section on both nifi.cluster.protocol.is.secure=false is configured as false (Cannot be true if running http.)

6. The # cluster node properties (only configure for cluster nodes) # has been configured only on your node.

- The following properties in the above node section are configured

nifi.cluster.is.node=true

nifi.cluster.node.unicast.manager.address=

nifi.cluster.node.unicast.manager.protocol.port=

and the port matched what you configured in the next section in your NCM.

8. The # cluster manager properties (only configure for cluster manager) # section has been configured on your NCM only.

- nifi.cluster.is.manager=true

Thanks,

Matt

avatar
Rising Star

Hi Matt,

Thanks again for replying. So here goes explaining the current state.

1) This started out "accidently" as a standalone config (Someone else issued a Start Nifi command BEFORE I had configured this for a clustered environment.) I followed your instructions and removed the flow.xml.gz file and the templates directory from a previous posting, then set up the configuration for a clustered environment.

2) starting Nifi with clustering syntax defined in all of the configuration files, seemed to be good idea, until AFTER the fact that we found out that a) I should not have edited the Nifi properties files logged on as ROOT (per Dhruv) and b) there was a leftover process id that was running a newer version of NiFi. (Hortonworks had told us to not run the current Apache Project version of 0.6.1, but to instead install 0.6.0 from their software download site.) So we killed the leftover process running 0.6.1, I am using 0.6.0 and I am now editing the files logged on as the NIFI user id.

3) Because of all of the files being owned initially by Root, whenever possible, I have issued CHOWN command i.e. CHOWN nifi:nifi filename. At least on the node which I am issuing the Start Nifi command from. There is one node for the NCM, and 3 other slave nodes.

4) So what happened yesterday: issued a Start Nifi command, which didn't throw any errors, the UI came up, but it didn't show it was attached to a cluster. So I went to the log files,to try and figure out why it doesn't see the cluster, only found one error "WARN' level error message, which is the one I shared.

Next will reply to your questions:

1. you have two different installation of NiFi. No

2. One installation of NiFi is setup and configured to be a non-secure (http) NCM. Only one, it is setup to be NON-secure

3. One instance of NiFi is setup and configured to be a non-secure (http) Node. Only one, it is setup to be NON-secure

4. The # cluster common properties (cluster manager and nodes must have same values) # section in the nifi.properties files on both NCM and Node(s) are configured identical

5. In that section on both nifi.cluster.protocol.is.secure=false is configured as false (Cannot be true if running http.) its set to false

6. The # cluster node properties (only configure for cluster nodes) # has been configured only on your node.

- The following properties in the above node section are configured

nifi.cluster.is.node=true set to true

nifi.cluster.node.unicast.manager.address= defined - example below

nifi.cluster.node.unicast.manager.protocol.port= defined - example below

and the port matched what you configured in the next section in your NCM.

8. The # cluster manager properties (only configure for cluster manager) # section has been configured on your NCM only.

- nifi.cluster.is.manager=true yes, set to true, only on NCM node

Below are the highlights of what I have set, on the SLAVE node, when I have been starting Nifi from. I commented out actual values of Host name and showed port numbers that are unique by entering 1xxx, 2xxx

# web properties #

nifi.web.war.directory=./lib

nifi.web.http.host=Axxxxx.xxxxxx.com

nifi.web.http.port=8080

nifi.web.https.host=

nifi.web.https.port=

nifi.web.jetty.working.directory=./work/jetty

nifi.web.jetty.threads=200

# cluster common properties (cluster manager and nodes must have same values) #

nifi.cluster.protocol.heartbeat.interval=5 sec

nifi.cluster.protocol.is.secure=false

nifi.cluster.protocol.socket.timeout=30 sec

nifi.cluster.protocol.connection.handshake.timeout=45 sec

# if multicast is used, then nifi.cluster.protocol.multicast.xxx properties must be configured #

nifi.cluster.protocol.use.multicast=false

nifi.cluster.protocol.multicast.address=

nifi.cluster.protocol.multicast.port=

nifi.cluster.protocol.multicast.service.broadcast.delay=500 ms

nifi.cluster.protocol.multicast.service.locator.attempts=3

nifi.cluster.protocol.multicast.service.locator.attempts.delay=1 sec

# cluster node properties (only configure for cluster nodes) #

nifi.cluster.is.node=true

nifi.cluster.node.address=Axxxxxx.xxxxxx.com

nifi.cluster.node.protocol.port=2xxxx

nifi.cluster.node.protocol.threads=2

# if multicast is not used, nifi.cluster.node.unicast.xxx must have same values as nifi.cluster.manager.xxx #

nifi.cluster.node.unicast.manager.address=Bxxxxx.xxxxxx.com

nifi.cluster.node.unicast.manager.protocol.port=1xxx

# cluster manager properties (only configure for cluster manager) #

nifi.cluster.is.manager=false

nifi.cluster.manager.address=

nifi.cluster.manager.protocol.port=

nifi.cluster.manager.node.firewall.file=

nifi.cluster.manager.node.event.history.size=10

nifi.cluster.manager.node.api.connection.timeout=30 sec

nifi.cluster.manager.node.api.read.timeout=30 sec

nifi.cluster.manager.node.api.request.threads=10

nifi.cluster.manager.flow.retrieval.delay=5 sec

nifi.cluster.manager.protocol.threads=10

nifi.cluster.manager.safemode.duration=0 sec

avatar
Master Mentor

You can edit files as root. Editing files does not change ownership. You just need to make sure at the end of editing all files are owned by the user who will be running yoUR NiFi instances.

Give yourself a fresh start and delete the flow.tar on your NCM and flow.xml.gz and templates dir on your Node.

So at the end of configuring your two NiFi installs (one install configured to be NCM and one separate install configured to be a Node), you started your NCM successfully? Looking in the nifi-app.log for your NCM, do you see the following lines:

2016-06-03 ... INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs: 2016-06-03 ... INFO [main] org.apache.nifi.web.server.JettyServer https://Bxxxxx.xxxxxx.com:8080/nifi

You then go to your other NiFi installation configured as your Node and start it. After it has started successfully it will start attempting to send heartbeats to Bxxxxx.xxxxxxx.com on port 1xxx. You should see these incoming heartbeats logged in the nifi-app.log on your NCM. Do you see these?

INFO [Process NCM Request-1] o.a.n.c.p.impl.SocketProtocolListener Received request 411684b2-25cb-461f-978e-fb3bda6a7ef0 from Axxxxx.xxxxxx.com

INFO [Process NCM Request-1] o.a.n.c.manager.impl.WebClusterManager Node Event: (......) 'Connection requested from new node. Setting status to connecting.'

After that the NCM will either mark the node as connected or given a reason for not allowing it to connect

If you are not seeing these heartbeats in the NCM nifi-app.log, then something is blocking the TCP traffic on the specified port. I did notice in the above example you provided 1xxx as your cluster manger port. Is that port above 1024? Ports <= 1024 are reserved and can't be used by non root users. If you are running your NCM as a user other then root (as it sounds by the above) NiFi will fail to bind to that port for listening for these heartbeats.

Matt

avatar
Rising Star

Hi Matt, Thank very much for your reply. I have defined the 1xxx port number higher than the 1024. Walking through the other information you shared.

avatar
Rising Star

Became ill, sorry for delay in replying.

Good news, have the NCM up and 3 slave nodes running.

Thank you Matt for your last update (and for responding to all of my questions I posted on this site), it helped me logic out the rest of the problems and the errors encountered after I found what you said to look for:" 2016-06-03 ... INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following ... " in the nifi-app.log file.....and I removed the flow.xml.gz file from the slave nodes. Those two tips gave me a fighting chance to figure this all out.

Lesson learned - making sure that the 'nifi' user is the owner of the directories and files and starting nifi logged in as the 'nifi' user, makes debugging messages in the 3 log files easier. Also - don't let someone try to start up NiFi logged in as root, before you get a chance to set up the configuration files for a clustered environment.

Thanks again Matt 🙂

PJ

avatar
Master Mentor

Glad I could help and good to hear you are now up and running.