Member since
07-24-2024
9
Posts
0
Kudos Received
0
Solutions
02-03-2026
03:58 AM
I am using only public.ecr.aws/bitnami/zookeeper:latest (3 ECS nodes). Seems this is the issue, as I didn't use a special config Could you provide the recommended approach for my use case?
... View more
02-02-2026
11:56 AM
After extensive troubleshooting and testing, the root cause was identified: starting all 3 ZooKeeper nodes before NiFi causes cluster instability during scheduled restarts. Startup sequence: Start single ZooKeeper node (ZK1) Start NiFi Registry (2 min wait) Start all 3 NiFi nodes in parallel (3 min wait) Add remaining ZooKeeper nodes (ZK2, ZK3) to complete ensemble Result: All nodes connect cleanly, no flapping, stable cluster formation What are your thoughts about this setup?
... View more
02-02-2026
08:23 AM
Thank you for the guidance! Answers to your questions: 1. Coordinator stable (Node 2), but overloaded (90% CPU, 2.6s heartbeat latency) 2. Yes, 8+ hours between stop/start (overnight shutdown) 3. No backlog (queues nearly empty) 4. No OOM exceptions 5. No long GC pauses observed 6. Yes, coordinator logs: "no heartbeat from node in 15089 seconds" (= 4.2hr downtime) Key issue: Step Function starts all services in PARALLEL. ZooKeeper nodes and NiFi nodes all start together, so NiFi connects before ZK quorum forms. Solution implemented: - Sequential ZK startup (ZK1 → ZK2 → ZK3 with waits for quorum) - Parallel NiFi node startup (all 3 together after ZK is ready) - Delete flow.json.gz on disconnected nodes → successful rejoin Question: Should we clear ZooKeeper /nifi state after 8hr shutdown, or does stable quorum + parallel node startup handle stale state automatically? All nodes now connected and stable. Will monitor through next scheduled cycle.
... View more
02-02-2026
04:20 AM
Environment NiFi 2.7.2, 3-node cluster (3EC2 - 3ASG) ZooKeeper: 3-node ensemble (ECS) NiFi Registry (ECS) AWS Step Functions for scheduled stop/start (cost optimization) Problem After an overnight shutdown/startup, the cluster becomes unstable: Nodes rapidly flap: CONNECTED → DISCONNECTED → CONNECTING (every 2-3 seconds). Error: "Have not received heartbeat from node". Critical: Disconnected node logs show NO errors/warnings - node appears healthy while coordinator reports it as disconnected. Root Cause Parallel startup via Step Function Map state causes: NiFi nodes start before the ZooKeeper quorum forms All 3 nodes start simultaneously → chaotic coordinator election Resolution Deleted flow.xml.gz on disconnected nodes → restart → nodes rejoined successfully Proposed Solution Sequential startup with proper wait times: 1. ZK1 → 60s → ZK2 → 60s → ZK3 → 120s (quorum)2. NiFi Registry → 90s3. Node 1 → 180s → Node 2 → 120s → Node 3 → 120s Sequential shutdown (reverse order) Questions Official guidance on startup sequencing for multi-node clusters with external ZooKeeper? Should ZooKeeper state be cleared during scheduled shutdowns? Why don't disconnected node logs show any issues? Node appears unaware of disconnection. Recommended wait times between service starts? Best practices for scheduled start/stop on auto-scaling infrastructure? Setup Details 32GB RAM, 20GB heap, G1GC Java 21 Amazon Corretto Time sync verified (chrony < 1μs drift) Network healthy, no packet loss Has anyone implemented similar scheduled automation for NiFi clusters? Any guidance appreciated!
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Zookeeper
-
NiFi Registry
12-16-2025
10:58 AM
Thank you for the guidance. Here are the specific CVEs identified by AWS Inspector in our NiFi Registry 2.6 scan: High Severity (12): CVE-2025-4802 (glibc) CVE-2023-31484 (perl) CVE-2025-6020 (pam) CVE-2023-52425 (expat) CVE-2025-66293 (libpng1.6) CVE-2025-32990 (gnutls28) CVE-2025-32988 (gnutls28) CVE-2025-9230 (openssl) CVE-2024-8176 (expat) CVE-2025-53066 (oracle/jdk) CVE-2025-64720 (libpng1.6) CVE-2025-65018 (libpng1.6) Medium Severity (12): CVE-2025-11226 (ch.qos.logback:logback-core) CVE-2025-64505 (libpng1.6) CVE-2025-64506 (libpng1.6) CVE-2024-50602 (expat) CVE-2025-3576 (krb5) CVE-2025-40909 (perl) CVE-2024-22365 (pam) CVE-2025-6395 (gnutls28) CVE-2025-9714 (libxml2) CVE-2025-32989 (gnutls28) CVE-2025-9232 (openssl) CVE-2025-53057 (oracle/jdk) Observations: Most vulnerabilities appear to be in system libraries (glibc, openssl, gnutls) and OS-level packages rather than NiFi Registry itself Several CVEs are from 2025, suggesting they may be very recent discoveries One application-level CVE: logback-core (logging library) Questions: Are these OS/system-level CVEs expected to be addressed by NiFi Registry updates, or should they be handled at the base image/OS level? Is there a recommended approach for managing these dependencies in containerized deployments? Has anyone else running NiFi Registry 2.6 seen similar scan results? Any guidance would be appreciated.
... View more
12-11-2025
02:56 AM
Hello, I've installed nifi 2.6 registry security - then I've scanned it in AWS Inspector, it shows me the following results: 0 Critical. 12 High. 12 Medium. Could anyoune confrim the results? And if this is a stable security version?
... View more
Labels:
- Labels:
-
NiFi Registry
-
Security
02-12-2025
08:43 AM
My NiFi (1.24) is running using the AWS ECS, 3 nodes, and connected via ALB as a cluster. So, my case is how to make sure that the opened node by the users/developers is the primary one (When accessing the ALB URL). This is to avoid allowing them to apply their changes over the non-primary one, as almost with every new flow update by the developer, I am getting one of my nodes disconnected and unable to sync with the primary one!
... View more
Labels:
- Labels:
-
Apache NiFi
02-12-2025
07:06 AM
Hi, We are running nifi 1.24, some times when developers apply the changes it causes a discount for one of the nodes, and it is unable to sync with the other nodes. How can I have a re-sync button to re-sync the flows, to avoid the need to delete the flow directly from the affected node?
... View more
Labels:
- Labels:
-
Apache NiFi
02-12-2025
07:00 AM
I am running 3 nifi nodes using AWS ECS, while the development team is working over the nifi console, we are getting an error related to the unable to sync. So the node is being disconnected. My mind is this issue is happening due to that the changes are not being made over the primary node. How can I enforce that the changes are applied directly to the primary node to avoid this issue? Can any one help?
... View more
Labels:
- Labels:
-
Apache NiFi
-
NiFi Registry