Support Questions

Find answers, ask questions, and share your expertise

issues while setting up Nifi Secure cluster version 1.0.0

avatar

Hi, @Bryan Bende i am following below post to set nifi cluster -

http://bryanbende.com/development/2016/08/17/apache-nifi-1-0-0-authorization-and-multi-tenancy

Getting this error - while starting one of the nodes in the cluster . looks like its a recursive loop. Can you please help troubleshoot this ?

Thanks!

Juthika

o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be '/app_2/runtime/nifi/./conf/nifi.properties' 2016-10-19 08:50:18,653 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be '/app_2/runtime/nifi/./conf/nifi.properties' 2016-10-19 08:50:18,654 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 116 properties from /app_2/runtime/nifi/./conf/nifi.properties 2016-10-19 08:50:48,301 INFO [main] o.a.n.admin.AuditDataSourceFactoryBean Database not built for repository: jdbc:h2:./database_repository/nifi-flow-audit;AUTOCOMMIT=OFF;DB_CLOSE_ON_EXIT=FALSE;LOCK_MODE=3;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE. Building now... 2016-10-19 08:50:48,595 INFO [main] o.a.nifi.util.FileBasedVariableRegistry Loaded 102 properties from system properties and environment variables 2016-10-19 08:50:48,597 INFO [main] o.a.nifi.util.FileBasedVariableRegistry Loaded 11 properties from './conf/coda.properties' 2016-10-19 08:50:48,598 INFO [main] o.a.nifi.util.FileBasedVariableRegistry Loaded a total of 113 properties. Including precedence overrides effective accessible registry key size is 113 2016-10-19 08:50:48,697 INFO [main] o.a.n.c.repository.FileSystemRepository Maximum Threshold for Container default set to 274719330795 bytes; if volume exceeds this size, archived data will be deleted until it no longer exceeds this size 2016-10-19 08:50:48,700 INFO [main] o.a.n.c.repository.FileSystemRepository Initializing FileSystemRepository with 'Always Sync' set to false 2016-10-19 08:50:49,082 INFO [main] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e7940b3 finished recovering records. Performing Checkpoint to ensure proper state of Partitions before updates 2016-10-19 08:50:49,082 INFO [main] org.wali.MinimalLockingWriteAheadLog Successfully recovered 0 records in 14 milliseconds 2016-10-19 08:50:49,103 INFO [main] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@3e7940b3 checkpointed with 0 Records and 0 Swap Files in 20 milliseconds (Stop-the-world time = 4 milliseconds, Clear Edit Logs time = 3 millis), max Transaction ID -1 2016-10-19 08:50:49,180 INFO [main] o.a.n.c.s.server.ZooKeeperStateServer Starting Embedded ZooKeeper Peer 2016-10-19 08:50:49,250 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected... 2016-10-19 08:50:49,324 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting 2016-10-19 08:50:56,269 WARN [main] o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine the Elected Leader for role 'Cluster Coordinator' due to org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator; assuming no leader has been elected 2016-10-19 08:50:56,270 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting 2016-10-19 08:50:56,378 INFO [main] o.apache.nifi.controller.FlowController It appears that no Cluster Coordinator has been Elected yet. Registering for Cluster Coordinator Role. 2016-10-19 08:50:56,379 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election. 2016-10-19 08:50:56,380 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting 2016-10-19 08:50:56,384 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election. 2016-10-19 08:50:56,384 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started 2016-10-19 08:50:56,384 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started 2016-10-19 08:50:56,434 WARN [main] o.eclipse.jetty.util.DeprecationWarning Using @Deprecated Class org.eclipse.jetty.servlets.GzipFilter 2016-10-19 08:50:56,435 WARN [main] org.eclipse.jetty.servlets.GzipFilter GzipFilter is deprecated. Use GzipHandler 2016-10-19 08:50:56,438 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@5529fd4e{/nifi-api,file:///app_2/runtime/nifi/work/jetty/nifi-web-api-1.1.0-SN...} 2016-10-19 08:50:57,106 INFO [main] /nifi-content-viewer No Spring WebApplicationInitializer types detected on classpath 2016-10-19 08:50:57,132 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@7dbc77ca{/nifi-content-viewer,file:///app_2/runtime/nifi/work/jetty/nifi-web-c...} 2016-10-19 08:50:57,134 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.s.h.ContextHandler@184e5c44{/nifi-docs,null,AVAILABLE} 2016-10-19 08:50:57,199 INFO [main] /nifi-docs No Spring WebApplicationInitializer types detected on classpath 2016-10-19 08:50:57,201 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@6527aa0{/nifi-docs,file:///app_2/runtime/nifi/work/jetty/nifi-web-docs-1.1.0-S...} 2016-10-19 08:50:57,238 INFO [main] / No Spring WebApplicationInitializer types detected on classpath 2016-10-19 08:50:57,261 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext@6cbb79c3{/,file:///app_2/runtime/nifi/work/jetty/nifi-web-error-1.1.0-SNAPSHOT...} 2016-10-19 08:50:57,269 INFO [main] o.e.jetty.util.ssl.SslContextFactory x509=X509@13157620(coda-nifi-ssl-cert,h=[apsrt3387.ccc.com, apsrt3389.ccc.com, apsrt3388.ccc.com, apsrt3391.ccc.com, apsrt3390.ccc.com, apsrt3402.ccc.com, apsrt3395.ccc.com, apsrt3394.ccc.com, apsrt3393.ccc.com, apsrt3396.ccc.com, apsrt3398.ccc.com, apsrt3397.ccc.com, apsrt3399.ccc.com, apsrt3409.ccc.com, apsrt3408.ccc.com, apsrt3403.ccc.com, apsrt3400.ccc.com, apsrt3401.ccc.com, apsrt3410.ccc.com, ccc.com],w=[]) for SslContextFactory@62a78446(file:///app_2/runtime/nifi/conf/coda-nifi-ssl-cert.pfx,file:///app_2/runt...) 2016-10-19 08:50:57,289 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector@361b2995{SSL,[ssl, http/1.1]}{apsrt3390.ccc.com:8443} 2016-10-19 08:50:57,289 INFO [main] org.eclipse.jetty.server.Server Started @87585ms 2016-10-19 08:50:58,372 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow... 2016-10-19 08:50:58,380 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 9443 2016-10-19 08:50:58,442 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: apsrt3390.ccc.com:8443 2016-10-19 08:51:05,041 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again 2016-10-19 08:51:05,041 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered 2016-10-19 08:51:13,776 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again 2016-10-19 08:51:13,776 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered 2016-10-19 08:51:14,013 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2016-10-19 08:51:14,015 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@15cca12c Connection State changed to SUSPENDED 2016-10-19 08:51:14,019 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_65] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65] 2016-10-19 08:51:14,020 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background retry gave up org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:838) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.11.0.jar:na] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300

1 ACCEPTED SOLUTION

avatar
Master Guru

That means the user you are logging in as does not have permission to access the UI. You can check nifi-user.log to see the user identity that is coming from your request (it should be the DN of your cert) and compare that to what is in users.xml and authorizations.xml.

If this is your "initial admin" identity then this should have been entered in authorizers.xml as the initial admin, and that would have granted it all the correct permissions. If you had already tried to setup an initial admin before then you need to delete users.xml and authorizations.xml before trying to change the "initial admin", otherwise it won't take effect.

View solution in original post

12 REPLIES 12

avatar
Master Guru

Hello, It looks like one of the nodes just can't connect to ZooKeeper. In my example everything was local and there was only one embedded ZK, which isn't really a production scenario, so I assume you have something slightly different.

Can you describe the ZooKeeper setup a little bit? are you running embedded ZK? and if so how many ZK instances and how many nodes in the NiFi cluster?

avatar

Thanks Bryan for the quick response .I have a 3 node cluster , I am running ZK on all the nodes .

I have nifi.properties as below on all three nodes with host names updated accordingly

nifi.state.management.embedded.zookeeper.start=true

nifi.zookeeper.connect.string=apsrt3391:2181,apsrt3390:2181,apsrt3401:2181

-------------------------------------------

nifi.cluster.is.node=true

nifi.cluster.node.address=apsrt3391

nifi.cluster.node.protocol.port=11443

nifi.cluster.node.protocol.threads=10

nifi.cluster.node.event.history.size=25

nifi.cluster.node.connection.timeout=5 sec

nifi.cluster.node.read.timeout=5 sec

nifi.cluster.firewall.file=

--------------------------------------------------------------

nifi.remote.input.host=apsrt3391

nifi.remote.input.secure=true

nifi.remote.input.socket.port=10443

nifi.remote.input.http.enabled=true

nifi.remote.input.http.transaction.ttl=30 sec

-------------------------------------------------------------

nifi.web.war.directory=./lib

nifi.web.http.host=

nifi.web.http.port=

nifi.web.https.host=apsrt3391

nifi.web.https.port=8443

Updated the State-Management.xml

<property name="Connect String">apsrt3391:2181,apsrt3390:2181,apsrt3402:2181</property>

And zookeeper.properties as

server.1=apsrt3390:2888:3888

server.2=apsrt3391:2888:3888

server.3=apsrt3402:2888:3888

Also created myid file with 1, 2,3 respectively in 3390,3391,3402

I have created one certificate which has the names of all the servers , I have exported the public portion of the cert and placed it in truststore.jks . the same certificate and truststore is installed on all the servers .

the authorizers.xml file looks like this .

<authorizer> <identifier>file-provider</identifier> <class>org.apache.nifi.authorization.FileAuthorizer</class> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Users File">./conf/users.xml</property> <property name="Initial Admin Identity"></property> <property name="Legacy Authorized Users File"></property>

<!-- Provide the identity (typically a DN) of each node when clustered, see above description of Node Identity. --> <property name="Node Identity 1">apsrt3390.ccc.com</property> <property name="Node Identity 2">apsrt3391.ccc.com</property> <property name="Node Identity 3">apsrt3402.ccc.com</property> </authorizer> </authorizers>

If the dn for all the servers is the same ,can I just mention ccc.com ?

Thanks very much for your help

Juthika

avatar
Master Guru

I think the issue might be a typo in:

nifi.zookeeper.connect.string=apsrt3391:2181,apsrt3390:2181,apsrt3401:2181

In other places you had apsrt3402 and not 3401.

I'm not totally sure about the setup of having one certificate that has all the server names. The DN is the distinguished name and is usually different per server, and each server would have a keystore that has that certificate for the DN of the given server. In your example the DN is not ccc.com, it should be something like:

CN=apsrt3391, OU=...
CN=apsrt3390, OU=...
CN=apsrt3402, OU=...

And each of them needs to be listed as a node identity in authorizers.xml.

avatar

hi Bryan , Good Morning , Thanks , that was a typo - while I was documenting it for the post , I double checked it , it looks fine . What I did last night was removed all security config - followed pvilliard 's doc . I still get the same error. I am thinking if the ports are not available for communication . I tried this command for the ports that I configured

(echo >/dev/tcp/localhost/10443) &>/dev/null && echo "TCP port 10443 open" || echo "TCP port 10443 close"

TCP port 10443 close

Telnet also shows as connection refused .

avatar

hi Bryan - Just to update you regarding this , I was able to get the unsecured cluster with 3 nodes working with only one instance of zookeeper. I had also removed secured login before . So, will be trying to make secure login work .

Thanks

avatar

Hi , I am not able to use the certs that I generated using a corporate tool to setup https login , so I went back to setting it up locally on my machine . I have the certs generated using the nifi-toolkit and updated properties etc , the server is up and running, but I cannot login . It could be because I need to add CA certs from my corporate browser to the truststore . But I cant open the truststore - looks like the toolkit ignored the password that I provided while generating the certs . DO you know what the default password is or how to find out what it generated ?

Thanks

Juthika

avatar
Master Guru

The toolkit produces a nifi.properties with the keystore and truststore, and that nifi.properties has the values for the keystore and truststore password filled in. You should use the toolkit to generate a client cert (p12) and load that into your browser and use that to access NiFI.

avatar

Thank you very much, I thought its encrypted password , but I was able to use it to open truststore . 🙂

avatar

I am getting this error

8852-yeepi.png

Can you please suggest what I should fix ?Thanks

Juthika