Created 12-15-2024 06:26 PM
Hi Folks,
I have 2 NameNode (HA), 16 DataNode, and 5 JournalNode. When I tried to shutdown 8 DataNode, 2 NameNode was down and not running.
Please give me a solution for my case. How many journal nodes are needed to be able to handle a minimum of 8 DataNodes that are down or have a greater than 50% failure rate?
Pleas Advice.....
Created 12-16-2024 01:50 PM
@rizalt
Can you share your layout of the 18 hosts to better understand where the issue could be emanating from?
The issue you are experiencing, where shutting down 8 DataNodes causes both NameNodes in your high availability (HA) configuration to go down, likely points to Quorum loss in the JournalNodes or insufficient replicas for critical metadata blocks.
The NameNodes in HA mode rely on JournalNodes for shared edits. For the HA setup to function correctly, the JournalNodes need a quorum (more than half) to be available.
With 5 JournalNodes, at least 3 must be operational. If shutting down 8 DataNodes impacted the connectivity or availability of more than 2 JournalNodes, the quorum would be lost, causing both NameNodes to stop functioning.
If shutting down 8 DataNodes reduces the number of replicas below the replication factor (typically 3), the metadata might not be available, causing the NameNodes to fail.
Please revert
Created 12-16-2024 10:47 PM
Thanks for the reply @Shelton
This is my layout of 2 Data Centers, each data center has 1 NameNode and 8 datanodes, in my case when 1 DC dies, HA continues to run
Created 12-19-2024 05:02 PM
Please Help me @Shelton , what is the maximum datanode failure percentage? I tried to install 11 JN, and 11 ZK, but it didn't work, out of 16 nodes only 7 datanodes can fail active or dead, I need 8 datanodes dead but HA still running