Member since
03-12-2020
7
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2425 | 05-04-2020 04:55 PM |
05-04-2020
04:55 PM
I managed to fix this. It was a faulty haproxy config. For the management services, I was missing the default_backend. The issue has thus been resolved.
... View more
04-05-2020
10:28 PM
On the MGMT server, I set the listening_hostname to its own hostname instead of the LB name set on the haproxy server, and it works fine. I suspect this is to do with the haproxy config however, I have done exactly as dictated in the cloudera documetation. Not sure what is missing.
... View more
04-05-2020
05:21 PM
Hello Everyone, Can someone please provide some advises about the fix. Thanks, Mithun.
... View more
03-31-2020
06:52 AM
I had the cloudera manager(cms) , management service (mgmt) and the database(mariadb) running on a single VM. In order to enable HA for cms, I separated the mgmt and database to independent servers. I shutdown scm-agent on the cms server and have it running on the mgmt server. I also deployed the HAproxy (v1.8) for load balancing as advised in the documentation. I migrated the databases from the current cms server to the new DB server.
Ref# https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_cm_ha_hosts.html
The DB and NFS mounts for the management service directories are served by a single host.
After following exactly as per the doc, management roles applied starts up fine in the UI without errors, however the status is always "Unknown Health".
To the top it shows this message:
The status of the services are "?"
The logs on the management service VM has the below errors:
cloudera-scm-agent LOG Errors:
[30/Mar/2020 16:16:35 +0000] 10723 DnsResolutionMonitor throttling_logger WARNING hostname sgsg2s214 differs from the canonical name cloudera-sg-mgmt-it-simulation.sg.flowtraders.local [30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the ACTIVITYMONITOR. [30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the SERVICEMONITOR. [30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR. [30/Mar/2020 16:17:05 +0000] 10723 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: mgmt-HOSTMONITOR-ea8a1f6943cbb1b40cd5fd8bdb7e1e51 Traceback (most recent call last): File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 121, in _send self._port) File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 469, in __init__ self.conn.connect() File "/usr/lib64/python2.7/httplib.py", line 824, in connect self.timeout, self.source_address) File "/usr/lib64/python2.7/socket.py", line 571, in create_connection raise err error: [Errno 111] Connection refused
HOSTMONITOR LOG Errors:
2020-03-31 20:55:55,805 INFO com.cloudera.cmon.tstore.leveldb.LDBPartitionManager: Opening partition LDBPartitionMetadataWrapper{tableName=ts_subject, partitionName=ts_subject_2020-03-30T13:52:40.108Z, startTime=2020-03-30T13:52:40.108Z, endTime=null, version=9, state=CLOSED} 2020-03-31 20:55:55,981 WARN com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher: Failed to send messages to SMON. java.lang.reflect.UndeclaredThrowableException at com.sun.proxy.$Proxy23.writeStatusRecords(Unknown Source) at com.cloudera.cmon.firehose.BasicFirehoseClient.writeStatusRecords(BasicFirehoseClient.java:75) at com.cloudera.cmon.firehose.HMONToSMONHostSubjectRecordPublisher.processRecords(HMONToSMONHostSubjectRecordPublisher.java:107) at com.cloudera.cmon.tstore.leveldb.LDBSubjectRecordStore.write(LDBSubjectRecordStore.java:399) at com.cloudera.cmon.kaiser.HMONTestRunner.runHostTestsForSession(HMONTestRunner.java:86) at com.cloudera.cmon.kaiser.HMONTestRunner.runTestsForSession(HMONTestRunner.java:66) at com.cloudera.cmon.kaiser.BaseTestRunner.runTestsOnAllSubjects(BaseTestRunner.java:143) at com.cloudera.cmon.kaiser.KaiserService$KaiserServiceRunner.run(KaiserService.java:138) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused) at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:104) ... 9 more Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309) at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:77) at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58) at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72) at org.apache.avro.ipc.Requestor.request(Requestor.java:147) at org.apache.avro.ipc.Requestor.request(Requestor.java:101) at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88) ... 9 more
The HAproxy has 2 IP's (all the ip's used for all the servers are accessible, no firewall of anykind involved), each IP has a DNS A record registered for the CMSserver and MGMTserver respectively.
On the management server, the file /etc/cloudera-scm-agent/config.ini has
server_host=CMSserver (the DNS name reserved on the proxy server for the cloudera manager)
listening_hostname=MGMTserver (the DNS name reserved on the proxy for the management service host)
On the mangement server, the /etc/hosts also has an entry for its local IP pointing to the MGMTserver name(the proxy host) as per the documentation.
I did a Inspect Host and the error of that is as follows:
Additionally the detailed logs on the UI cant be accessed. The below error is thrown:
I also tried dropping the amon database that was migrated from the old setup assuming some data corruption. No luck.
Running the below python script on the management service host also gives me the proxy servers DNS name as expected, which is what is used in the config.ini for listening_hostname.
python -c 'import socket; print socket.getfqdn(),socket.gethostbyname(socket.getfqdn())'
The HAproxy config is below.
frontend cmf bind *:7180 mode tcp option tcplog default_backend cmf
backend cmf server cmfhttp1 cloudera-managerA.sg.com:7180 check server cmfhttp2 cloudera-managerB.sg.com:7180 check
frontend cmfavro bind *:7182 mode tcp option tcplog default_backend cmfavro
backend cmfavro server cmfavro1 cloudera-managerA.sg.com:7182 check server cmfavro2 cloudera-managerB.sg.com:7182 check
frontend mgmt1 bind *:5678 mode tcp option tcplog
backend mgmt1 server mgmt1a management-serverA.sg.com check server mgmt1b management-serverB.sg.com check
frontend mgmt2 bind *:7184 mode tcp option tcplog
backend mgmt2 server mgmt2a management-serverA.sg.com check server mgmt2b management-serverB.sg.com check
frontend mgmt3 bind *:7185 mode tcp option tcplog
backend mgmt3 server mgmt3a management-serverA.sg.com check server mgmt3b management-serverB.sg.com check
frontend mgmt4 bind *:7186 mode tcp option tcplog
backend mgmt4 server mgmt4a management-serverA.sg.com check server mgmt4b management-serverB.sg.com check
frontend mgmt5 bind *:7187 mode tcp option tcplog
backend mgmt5 server mgmt5a management-serverA.sg.com check server mgmt5b management-serverB.sg.com check
frontend mgmt6 bind *:8083 mode tcp option tcplog
backend mgmt6 server mgmt6a management-serverA.sg.com check server mgmt6b management-serverB.sg.com check
frontend mgmt7 bind *:8084 mode tcp option tcplog
backend mgmt7 server mgmt7a management-serverA.sg.com check server mgmt7b management-serverB.sg.com check
frontend mgmt8 bind *:8086 mode tcp option tcplog
backend mgmt8 server mgmt8a management-serverA.sg.com check server mgmt8b management-serverB.sg.com check
frontend mgmt9 bind *:8087 mode tcp option tcplog
backend mgmt9 server mgmt9a management-serverA.sg.com check server mgmt9b management-serverB.sg.com check
frontend mgmt10 bind *:8091 mode tcp option tcplog
backend mgmt10 server mgmt10a management-serverA.sg.com check server mgmt10b management-serverB.sg.com check
frontend mgmt-agent bind *:9000 mode tcp option tcplog
backend mgmt-agent server mgmt-agenta management-serverA.sg.com check server mgmt-agentb management-serverB.sg.com check
frontend mgmt11 bind *:9994 mode tcp option tcplog
backend mgmt11 server mgmt11a management-serverA.sg.com check server mgmt11b management-serverB.sg.com check
frontend mgmt12 bind *:9995 mode tcp option tcplog
backend mgmt12 server mgmt12a management-serverA.sg.com check server mgmt12b management-serverB.sg.com check
frontend mgmt13 bind *:9996 mode tcp option tcplog
backend mgmt13 server mgmt13a management-serverA.sg.com check server mgmt13b management-serverB.sg.com check
frontend mgmt14 bind *:9997 mode tcp option tcplog
backend mgmt14 server mgmt14a management-serverA.sg.com check server mgmt14b management-serverB.sg.com check
frontend mgmt15 bind *:9998 mode tcp option tcplog
backend mgmt15 server mgmt15a management-serverA.sg.com check server mgmt15b management-serverB.sg.com check
frontend mgmt16 bind *:9999 mode tcp option tcplog
backend mgmt16 server mgmt16a management-serverA.sg.com check server mgmt16b management-serverB.sg.com check
frontend mgmt17 bind *:10101 mode tcp option tcplog
backend mgmt17 server mgmt17a management-serverA.sg.com check server mgmt17b management-serverB.sg.com check
Struggling from 4 days, any help would be much appreciated.
... View more
Labels:
- Labels:
-
Cloudera Manager