Created on 09-02-2022 12:28 AM - edited 09-02-2022 01:14 AM
Hello,
one of our cluster nodes (a 3 node cluster v1.16.3 on Kubernetes) ran out of disk space. After increasing the disk space, we now get the above mentioned exception in the logs and the node does not start.
What can we do to fix this and get the node to start again?
Thanks a lot!
Related logs:
{"@timestamp":"2022-09-02T07:49:50.865Z","log.level": "WARN","message":"Encountered unexpected End-of-File when reading journal file ../flowfile_repository/journals/474192833.journal; assuming that NiFi was shutdown unexpectedly and continuing recovery","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.wali.LengthDelimitedJournal"}
{"@timestamp":"2022-09-02T07:49:50.866Z","log.level": "INFO","message":"Successfully recovered 0 updates from journal ../flowfile_repository/journals/474192833.journal","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.wali.LengthDelimitedJournal"}
{"@timestamp":"2022-09-02T07:49:50.866Z","log.level": "INFO","message":"Recovering records from journal ../flowfile_repository/journals/474193048.journal","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.wali.LengthDelimitedJournal"}
{"@timestamp":"2022-09-02T07:49:51.051Z","log.level": "INFO","message":"Starting the following components: AffectedComponentSet[inputPorts=[], outputPorts=[], remoteInputPorts=[], remoteOutputPorts=[], processors=[], controllerServices=[], reportingTasks=[]]","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.controller.serialization.AffectedComponentSet"}
{"@timestamp":"2022-09-02T07:49:51.057Z","log.level": "WARN","message":"Failed to close Leader Selector for Cluster Coordinator","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager","error.type":"java.lang.IllegalStateException","error.message":"Already closed or has not been started","error.stack_trace":[
"java.lang.IllegalStateException: Already closed or has not been started",
"\tat org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)",
"\tat org.apache.curator.framework.recipes.leader.LeaderSelector.close(LeaderSelector.java:272)",
"\tat org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.stop(CuratorLeaderElectionManager.java:198)",
"\tat org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1303)",
"\tat org.apache.nifi.controller.StandardFlowService.stop(StandardFlowService.java:331)",
"\tat org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1093)",
"\tat org.apache.nifi.NiFi.<init>(NiFi.java:170)",
"\tat org.apache.nifi.NiFi.<init>(NiFi.java:82)",
"\tat org.apache.nifi.NiFi.main(NiFi.java:330)"]}
{"@timestamp":"2022-09-02T07:49:51.058Z","log.level": "INFO","message":"backgroundOperationsLoop exiting","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"Curator-Framework-0","log.logger":"org.apache.curator.framework.imps.CuratorFrameworkImpl"}
{"@timestamp":"2022-09-02T07:49:51.170Z","log.level": "INFO","message":"CuratorLeaderElectionManager[stopped=true] stopped and closed","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager"}
{"@timestamp":"2022-09-02T07:49:51.171Z","log.level": "INFO","message":"Initiated graceful shutdown of flow controller...waiting up to 10 seconds","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.controller.FlowController"}
{"@timestamp":"2022-09-02T07:49:52.169Z","log.level": "INFO","message":"Controller has been terminated successfully.","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.controller.FlowController"}
{"@timestamp":"2022-09-02T07:49:52.189Z","log.level": "INFO","message":"Socket Listener has been terminated successfully.","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.io.socket.SocketListener"}
{"@timestamp":"2022-09-02T07:49:52.190Z","log.level": "WARN","message":"Failed to communicate with Unknown Host due to java.net.SocketException: Socket closed","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"Cluster Socket Listener","log.logger":"org.apache.nifi.io.socket.SocketListener","error.type":"java.net.SocketException","error.message":"Socket closed","error.stack_trace":[
"java.net.SocketException: Socket closed",
"\tat java.net.PlainSocketImpl.socketAccept(Native Method)",
"\tat java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)",
"\tat java.net.ServerSocket.implAccept(ServerSocket.java:560)",
"\tat sun.security.ssl.SSLServerSocketImpl.accept(SSLServerSocketImpl.java:199)",
"\tat org.apache.nifi.io.socket.SocketListener$2.run(SocketListener.java:107)",
"\tat java.lang.Thread.run(Thread.java:750)"]}
{"@timestamp":"2022-09-02T07:49:52.216Z","log.level":"ERROR","message":"Unable to load flow due to: org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed to determine which connections have FlowFiles queued","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.web.server.JettyServer","error.id":"1ce175565247a450c12c939903859628","error.type":"org.apache.nifi.controller.serialization.FlowSynchronizationException","error.message":"Failed to determine which connections have FlowFiles queued","error.stack_trace":[
"org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed to determine which connections have FlowFiles queued",
"\tat org.apache.nifi.controller.inheritance.ConnectionMissingCheck.checkInheritability(ConnectionMissingCheck.java:82)",
"\tat org.apache.nifi.controller.inheritance.ConnectionMissingCheck.checkInheritability(ConnectionMissingCheck.java:63)",
"\tat org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.verifyNoConnectionsWithDataRemoved(VersionedFlowSynchronizer.java:207)",
"\tat org.apache.nifi.controller.serialization.VersionedFlowSynchronizer.sync(VersionedFlowSynchronizer.java:186)",
"\tat org.apache.nifi.controller.serialization.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:43)",
"\tat org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1524)",
"\tat org.apache.nifi.persistence.StandardFlowConfigurationDAO.load(StandardFlowConfigurationDAO.java:107)",
"\tat org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:819)",
"\tat org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:461)",
"\tat org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1086)",
"\tat org.apache.nifi.NiFi.<init>(NiFi.java:170)",
"\tat org.apache.nifi.NiFi.<init>(NiFi.java:82)",
"\tat org.apache.nifi.NiFi.main(NiFi.java:330)",
"Caused by: java.io.IOException: Invalid header information - ../flowfile_repository/journals/474193048.journal does not appear to be a valid journal file.",
"\tat org.apache.nifi.wali.LengthDelimitedJournal.validateHeader(LengthDelimitedJournal.java:190)",
"\tat org.apache.nifi.wali.LengthDelimitedJournal.recoverRecords(LengthDelimitedJournal.java:428)",
"\tat org.apache.nifi.wali.SequentialAccessWriteAheadLog.recoverRecords(SequentialAccessWriteAheadLog.java:200)",
"\tat org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.findQueuesWithFlowFiles(WriteAheadFlowFileRepository.java:834)",
"\tat org.apache.nifi.controller.inheritance.ConnectionMissingCheck.checkInheritability(ConnectionMissingCheck.java:80)",
"\t... 12 more"]}
{"@timestamp":"2022-09-02T07:49:52.218Z","log.level": "WARN","message":"Failed to start web server... shutting down.","ecs.version": "1.2.0","service.name":"nifi-app.log","service.node.name":"nifi-prod-1","event.dataset":"nifi-app.log","process.thread.name":"main","log.logger":"org.apache.nifi.web.server.JettyServer","error.type":"java.lang.Exception","error.message":"Unable to load flow due to: org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed to determine which connections have FlowFiles queued","error.stack_trace":[
"java.lang.Exception: Unable to load flow due to: org.apache.nifi.controller.serialization.FlowSynchronizationException: Failed to determine which connections have FlowFiles queued",
"\tat org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1096)",
"\tat org.apache.nifi.NiFi.<init>(NiFi.java:170)",
"\tat org.apache.nifi.NiFi.<init>(NiFi.java:82)",
"\tat org.apache.nifi.NiFi.main(NiFi.java:330)"]}
Created 09-02-2022 01:42 AM
Ok, I found a solution.
Don't know why, but I kept missing the line
"Caused by: java.io.IOException: Invalid header information - ../flowfile_repository/journals/474193048.journal does not appear to be a valid journal file.",
in the logs. Apparently, the journal file got corrupted when the disk space ran out.
I deleted the file and the node came back up.
Created 09-02-2022 01:42 AM
Ok, I found a solution.
Don't know why, but I kept missing the line
"Caused by: java.io.IOException: Invalid header information - ../flowfile_repository/journals/474193048.journal does not appear to be a valid journal file.",
in the logs. Apparently, the journal file got corrupted when the disk space ran out.
I deleted the file and the node came back up.