Support Questions

Find answers, ask questions, and share your expertise

Zookeeper Service showing "bad health" after deployment

avatar
Expert Contributor

I just finished installing a new cluster but failed to start the zookeeper service. Each instance started but had the "bad health" flag. The zookeeper log on node 3 showed this error repeatedly. 

++++
Cannot open channel to 1 at election address node1/61.62.63.1:4181
java.net.ConnectException: Connection refused
++++

On node 3, the connection to node 1 zookeeper port showed no issue.

$ nc -zv node1 4181
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to 61.62.63.1:4181.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

On node1, the zookeeper port is open too
$ ss -tulnp | grep 4181
tcp LISTEN 0 50 61.62.63.1:4181 0.0.0.0:*

The private cloud base cluster is 7.1.9 sp1.

Thank you.

Regards,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Cloudera Support helped me resolved this issue.

* The "Bad Health" status displayed in Cloudera Manager was a false-positive monitoring alert.
* The Cloudera Manager Service Monitor (SMON) was failing its secure TLS connection handshakes to ZooKeeper due to strict endpoint identification checks introduced in modern Java runtimes (Java 17). Because SMON couldn't pull health metrics, it flagged ZooKeeper as down.
* The solution is to configure the JVM argument inside the SMON configuration "Java Configuration Options for Service Monitor (firehose_java_opts)" to bypass the strict certificate hostname checks: `-Djdk.rmi.ssl.client.enableEndpointIdentification=false`.
 
This cluster is 7.1.9sp1 with 7.13.1 CM. Strangely, another cluster, which has the same cluster version, CM version, and Java version, had no such issue. It was set up six months ago.
 
Thanks for all the responses.
 
Best regards,

View solution in original post

3 REPLIES 3

avatar
Community Manager

@Seaport Hello team @pajoshi @shubham_sharma 
Do you have any insights here? Thanks!


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Super Collaborator

Hello @Seaport ,

Can you check if the reverse DNS lookup is working fine from each zookeeper host. Also worth checking the zoo.cfg file under the latest /var/run/cloudera-scm-agent/process/ xxxx-zookeeper-server/ directory to check how the servers and ports are configured.

avatar
Expert Contributor

Cloudera Support helped me resolved this issue.

* The "Bad Health" status displayed in Cloudera Manager was a false-positive monitoring alert.
* The Cloudera Manager Service Monitor (SMON) was failing its secure TLS connection handshakes to ZooKeeper due to strict endpoint identification checks introduced in modern Java runtimes (Java 17). Because SMON couldn't pull health metrics, it flagged ZooKeeper as down.
* The solution is to configure the JVM argument inside the SMON configuration "Java Configuration Options for Service Monitor (firehose_java_opts)" to bypass the strict certificate hostname checks: `-Djdk.rmi.ssl.client.enableEndpointIdentification=false`.
 
This cluster is 7.1.9sp1 with 7.13.1 CM. Strangely, another cluster, which has the same cluster version, CM version, and Java version, had no such issue. It was set up six months ago.
 
Thanks for all the responses.
 
Best regards,