Member since
05-18-2020
2
Posts
0
Kudos Received
0
Solutions
09-04-2020
03:37 AM
Hello @Atradius , thank you for reaching out to the Community with your issue of having both NameNodes down. Do you see in your NN log entries like JvmPauseMonitor saying "Detected pause in JVM or host machine" and a value larger than 1000ms, please? It can be an indication that your service is running out of heap. If it is the NameNode, the short-term solution is to increase the heap and restart the service. A long term solution is to identify why did you run out of heap? E.g. do you face with small files issue? Please read article [1] about how to tackle this. Losing quorum might be caused by ZK service issue, when the ZK is not in quorum. Please check the ZK logs as well. Please let us know if you need more input to progress with your investigation. Best regards: Ferenc [1] https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges/
... View more