One of our clients is running Cloudera Manager Express 5.3.2 and CDH 5.3.2.
There is a strange problem with the Impala Catalog Server.
It has Bad Health in Cloudera Manager which shows the following messages:
StateStore Connectivity Bad
This Catalog Server is not connected to its StateStore.
I can't find anything useful in the log files of the Catalog Server or the StateStore.
Also everything seems to be working properly and the metadata changes are relayed successfully (I performed several tests using 2 different Data Nodes).
In the StateStore Web UI I can see the Catalog Server in the list of the subscribers.
Subscribed topics: 1
Transient entries: 0
If I stop the Catalog Server this entry disappears so it is not a stale one.
I even tried to delete the whole Impala Service and install it again using different node for the StateStore and the Catalog Server. However nothing seems to be helping.
Kerberos and SSL are not used in the cluster.
I suspect that there is only a problem with the Cloudera Manager health check but I have no idea how to troubleshoot further
Please let me know if you have any suggestions.
The cluster is now running CM/CDH 5.7.1 and the Catalog Server Connectivity Health Test is still showing This Catalog Server is not connected to its StateStore.
I am keeping the check disabled, but I enabled it temporarily to see if the issue has been resolved.
It appears to be a rare issue because searching in Google doesn't help.
This issue could be caused by the Hive metastore. First, check if the Impala cataolog server is listening on port 23030 and 2600. If not, you may have a same problem as mine.
For my case, I guess that the Impala catalog server keeps trying to connect Hive metastore, but not successfully. The timeout setting for this connection effort seems to be several hours. During this period, the catalog server will not listen to port 23020 and 26000. Therefore, the Impala statestore cannot establish the connection with the catalog server, which leads to the failure of the health test. Sometime, the health test will pass after several hours. But it does not mean that everything is OK. Actually (I guess), the Catalog server just gives up trying, and move ahead to start its service. Therefore, the statestore server can establish the connection with it. But when you excute some queries, Impala will report errors.
To fix this problem, you have to solve the Hive metastore issues, or just rebuilt it. The following links give some details on that,
In my case catalogd listens on ports 23020, 25020 and 26000.
Also everything seems to be working fine.
I can see this subscriber in the StateStore UI:
And also in the Catalog Server UI:
statestore-subscriber.connected = true
So it really appears that the check is producing false positive, but I don't know how to debug it.
It will be helpful if anyone can give a hint in which log to look to find why the check is failing.