Support Questions

cjervis · ‎03-31-2020

I had the cloudera manager(cms) , management service (mgmt) and the database(mariadb) running on a single VM. In order to enable HA for cms, I separated the mgmt and database to independent servers. I shutdown scm-agent on the cms server and have it running on the mgmt server. I also deployed the HAproxy (v1.8) for load balancing as advised in the documentation. I migrated the databases from the current cms server to the new DB server.

Ref# https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_cm_ha_hosts.html

The DB and NFS mounts for the management service directories are served by a single host.

After following exactly as per the doc, management roles applied starts up fine in the UI without errors, however the status is always "Unknown Health".

To the top it shows this message:

The status of the services are "?"

The logs on the management service VM has the below errors:

cloudera-scm-agent LOG Errors:

Spoiler

[30/Mar/2020 16:16:35 +0000] 10723 DnsResolutionMonitor throttling_logger WARNING hostname sgsg2s214 differs from the canonical name cloudera-sg-mgmt-it-simulation.sg.flowtraders.local
[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.
[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.
[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-ea8a1f6943cbb1b40cd5fd8bdb7e1e51
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send
self._port)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.7/httplib.py", line 824, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused

[30/Mar/2020 16:16:35 +0000] 10723 DnsResolutionMonitor throttling_logger WARNING hostname sgsg2s214 differs from the canonical name cloudera-sg-mgmt-it-simulation.sg.flowtraders.local[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR.[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR.[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.[30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-ea8a1f6943cbb1b40cd5fd8bdb7e1e51Traceback (most recent call last):File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _sendself._port)File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__self.conn.connect()File "/usr/lib64/python2.7/httplib.py", line 824, in connectself.timeout, self.source_address)File "/usr/lib64/python2.7/socket.py", line 571, in create_connectionraise errerror: [Errno 111] Connection refused

HOSTMONITOR LOG Errors:

Spoiler

2020-03-31 20:55:55,805 INFO com.cloudera.cmon.tstore.leveldb.LDBPartitionManager: Opening partition LDBPartitionMetadataWrapper{tableName=ts_subject, partitionName=ts_subject_2020-03-30T13:52:40.108Z, startTime=2020-03-30T13:52:40.108Z, endTime=null, version=9, state=CLOSED}
2020-03-31 20:55:55,981 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)
at com.cloudera.cmon.firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75)
at com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107)
at com.cloudera.cmon.tstore.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399)
at com.cloudera.cmon.kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86)
at com.cloudera.cmon.kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66)
at com.cloudera.cmon.kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:143)
at com.cloudera.cmon.kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104)
... 9 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)
at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:77)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72)
at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
... 9 more

2020-03-31 20:55:55,805 INFO com.cloudera.cmon.tstore.leveldb.LDBPartitionManager: Opening partition LDBPartitionMetadataWrapper{tableName=ts_subject, partitionName=ts_subject_2020-03-30T13:52:40.108Z, startTime=2020-03-30T13:52:40.108Z, endTime=null, version=9, state=CLOSED}2020-03-31 20:55:55,981 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON.java.lang.reflect.UndeclaredThrowableExceptionat com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source)at com.cloudera.cmon.firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75)at com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107)at com.cloudera.cmon.tstore.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399)at com.cloudera.cmon.kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86)at com.cloudera.cmon.kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66)at com.cloudera.cmon.kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:143)at com.cloudera.cmon.kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:138)at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104)... 9 moreCaused by: java.net.ConnectException: Connection refused (Connection refused)at java.net.PlainSocketImpl.socketConnect(Native Method)at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)at java.net.Socket.connect(Socket.java:589)at sun.net.NetworkClient.doConnect(NetworkClient.java:175)at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)at sun.net.www.http.HttpClient.New(HttpClient.java:339)at sun.net.www.http.HttpClient.New(HttpClient.java:357)at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:77)at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58)at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72)at org.apache.avro.ipc.Requestor.request(Requestor.java:147)at org.apache.avro.ipc.Requestor.request(Requestor.java:101)at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)... 9 more

The HAproxy has 2 IP's (all the ip's used for all the servers are accessible, no firewall of anykind involved), each IP has a DNS A record registered for the CMSserver and MGMTserver respectively.

On the management server, the file /etc/cloudera-scm-agent/config.ini has

server_host=CMSserver (the DNS name reserved on the proxy server for the cloudera manager)

listening_hostname=MGMTserver (the DNS name reserved on the proxy for the management service host)

On the mangement server, the /etc/hosts also has an entry for its local IP pointing to the MGMTserver name(the proxy host) as per the documentation.

I did a Inspect Host and the error of that is as follows:

Additionally the detailed logs on the UI cant be accessed. The below error is thrown:

I also tried dropping the amon database that was migrated from the old setup assuming some data corruption. No luck.

Running the below python script on the management service host also gives me the proxy servers DNS name as expected, which is what is used in the config.ini for listening_hostname.

python -c 'import socket; print socket.getfqdn(),socket.gethostbyname(socket.getfqdn())'

The HAproxy config is below.

Spoiler

frontend cmf
bind *:7180
mode tcp
option tcplog
default_backend cmf

backend cmf
server cmfhttp1 cloudera-managerA.sg.com:7180 check
server cmfhttp2 cloudera-managerB.sg.com:7180 check

frontend cmfavro
bind *:7182
mode tcp
option tcplog
default_backend cmfavro

backend cmfavro
server cmfavro1 cloudera-managerA.sg.com:7182 check
server cmfavro2 cloudera-managerB.sg.com:7182 check

frontend mgmt1
bind *:5678
mode tcp
option tcplog

backend mgmt1
server mgmt1a management-serverA.sg.com check
server mgmt1b management-serverB.sg.com check

frontend mgmt2
bind *:7184
mode tcp
option tcplog

backend mgmt2
server mgmt2a management-serverA.sg.com check
server mgmt2b management-serverB.sg.com check

frontend mgmt3
bind *:7185
mode tcp
option tcplog

backend mgmt3
server mgmt3a management-serverA.sg.com check
server mgmt3b management-serverB.sg.com check

frontend mgmt4
bind *:7186
mode tcp
option tcplog

backend mgmt4
server mgmt4a management-serverA.sg.com check
server mgmt4b management-serverB.sg.com check

frontend mgmt5
bind *:7187
mode tcp
option tcplog

backend mgmt5
server mgmt5a management-serverA.sg.com check
server mgmt5b management-serverB.sg.com check

frontend mgmt6
bind *:8083
mode tcp
option tcplog

backend mgmt6
server mgmt6a management-serverA.sg.com check
server mgmt6b management-serverB.sg.com check

frontend mgmt7
bind *:8084
mode tcp
option tcplog

backend mgmt7
server mgmt7a management-serverA.sg.com check
server mgmt7b management-serverB.sg.com check

frontend mgmt8
bind *:8086
mode tcp
option tcplog

backend mgmt8
server mgmt8a management-serverA.sg.com check
server mgmt8b management-serverB.sg.com check

frontend mgmt9
bind *:8087
mode tcp
option tcplog

backend mgmt9
server mgmt9a management-serverA.sg.com check
server mgmt9b management-serverB.sg.com check

frontend mgmt10
bind *:8091
mode tcp
option tcplog

backend mgmt10
server mgmt10a management-serverA.sg.com check
server mgmt10b management-serverB.sg.com check

frontend mgmt-agent
bind *:9000
mode tcp
option tcplog

backend mgmt-agent
server mgmt-agenta management-serverA.sg.com check
server mgmt-agentb management-serverB.sg.com check

frontend mgmt11
bind *:9994
mode tcp
option tcplog

backend mgmt11
server mgmt11a management-serverA.sg.com check
server mgmt11b management-serverB.sg.com check

frontend mgmt12
bind *:9995
mode tcp
option tcplog

backend mgmt12
server mgmt12a management-serverA.sg.com check
server mgmt12b management-serverB.sg.com check

frontend mgmt13
bind *:9996
mode tcp
option tcplog

backend mgmt13
server mgmt13a management-serverA.sg.com check
server mgmt13b management-serverB.sg.com check

frontend mgmt14
bind *:9997
mode tcp
option tcplog

backend mgmt14
server mgmt14a management-serverA.sg.com check
server mgmt14b management-serverB.sg.com check

frontend mgmt15
bind *:9998
mode tcp
option tcplog

backend mgmt15
server mgmt15a management-serverA.sg.com check
server mgmt15b management-serverB.sg.com check

frontend mgmt16
bind *:9999
mode tcp
option tcplog

backend mgmt16
server mgmt16a management-serverA.sg.com check
server mgmt16b management-serverB.sg.com check

frontend mgmt17
bind *:10101
mode tcp
option tcplog

backend mgmt17
server mgmt17a management-serverA.sg.com check
server mgmt17b management-serverB.sg.com check

frontend cmfbind *:7180mode tcpoption tcplogdefault_backend cmf backend cmfserver cmfhttp1 cloudera-managerA.sg.com:7180 checkserver cmfhttp2 cloudera-managerB.sg.com:7180 check frontend cmfavrobind *:7182mode tcpoption tcplogdefault_backend cmfavro backend cmfavroserver cmfavro1 cloudera-managerA.sg.com:7182 checkserver cmfavro2 cloudera-managerB.sg.com:7182 check frontend mgmt1bind *:5678mode tcpoption tcplog backend mgmt1server mgmt1a management-serverA.sg.com checkserver mgmt1b management-serverB.sg.com check frontend mgmt2bind *:7184mode tcpoption tcplog backend mgmt2server mgmt2a management-serverA.sg.com checkserver mgmt2b management-serverB.sg.com check frontend mgmt3bind *:7185mode tcpoption tcplog backend mgmt3server mgmt3a management-serverA.sg.com checkserver mgmt3b management-serverB.sg.com check frontend mgmt4bind *:7186mode tcpoption tcplog backend mgmt4server mgmt4a management-serverA.sg.com checkserver mgmt4b management-serverB.sg.com check frontend mgmt5bind *:7187mode tcpoption tcplog backend mgmt5server mgmt5a management-serverA.sg.com checkserver mgmt5b management-serverB.sg.com check frontend mgmt6bind *:8083mode tcpoption tcplog backend mgmt6server mgmt6a management-serverA.sg.com checkserver mgmt6b management-serverB.sg.com check frontend mgmt7bind *:8084mode tcpoption tcplog backend mgmt7server mgmt7a management-serverA.sg.com checkserver mgmt7b management-serverB.sg.com check frontend mgmt8bind *:8086mode tcpoption tcplog backend mgmt8server mgmt8a management-serverA.sg.com checkserver mgmt8b management-serverB.sg.com check frontend mgmt9bind *:8087mode tcpoption tcplog backend mgmt9server mgmt9a management-serverA.sg.com checkserver mgmt9b management-serverB.sg.com check frontend mgmt10bind *:8091mode tcpoption tcplog backend mgmt10server mgmt10a management-serverA.sg.com checkserver mgmt10b management-serverB.sg.com check frontend mgmt-agentbind *:9000mode tcpoption tcplog backend mgmt-agentserver mgmt-agenta management-serverA.sg.com checkserver mgmt-agentb management-serverB.sg.com check frontend mgmt11bind *:9994mode tcpoption tcplog backend mgmt11server mgmt11a management-serverA.sg.com checkserver mgmt11b management-serverB.sg.com check frontend mgmt12bind *:9995mode tcpoption tcplog backend mgmt12server mgmt12a management-serverA.sg.com checkserver mgmt12b management-serverB.sg.com check frontend mgmt13bind *:9996mode tcpoption tcplog backend mgmt13server mgmt13a management-serverA.sg.com checkserver mgmt13b management-serverB.sg.com check frontend mgmt14bind *:9997mode tcpoption tcplog backend mgmt14server mgmt14a management-serverA.sg.com checkserver mgmt14b management-serverB.sg.com check frontend mgmt15bind *:9998mode tcpoption tcplog backend mgmt15server mgmt15a management-serverA.sg.com checkserver mgmt15b management-serverB.sg.com check frontend mgmt16bind *:9999mode tcpoption tcplog backend mgmt16server mgmt16a management-serverA.sg.com checkserver mgmt16b management-serverB.sg.com check frontend mgmt17bind *:10101mode tcpoption tcplog backend mgmt17server mgmt17a management-serverA.sg.com checkserver mgmt17b management-serverB.sg.com check

Struggling from 4 days, any help would be much appreciated.

Mithun119 · ‎05-04-2020

I managed to fix this. It was a faulty haproxy config. For the management services, I was missing the default_backend. The issue has thus been resolved.

View solution in original post

Mithun119 · ‎04-05-2020

Hello Everyone,

Can someone please provide some advises about the fix.

Thanks,

Mithun.

Mithun119 · ‎04-05-2020

On the MGMT server, I set the listening_hostname to its own hostname instead of the LB name set on the haproxy server, and it works fine. I suspect this is to do with the haproxy config however, I have done exactly as dictated in the cloudera documetation. Not sure what is missing.

Mithun119 · ‎05-04-2020

I managed to fix this. It was a faulty haproxy config. For the management services, I was missing the default_backend. The issue has thus been resolved.

Cloudera Community

Support Questions

Cloudera Management Service Unknown Health after separating Management Service and DB to independent server