Support Questions

Find answers, ask questions, and share your expertise

CMS is down and cant able to get back that services

avatar
Rising Star

Hello Team,

 

All of sudden service monitor was went down and when i checked the logs i can see connection refused error in logs.

I have tried to restart all the CMS services and this time all went down.

What i was did is i have deleted CMS and trying to re-add the CMS to cloudera manager,

This time i am not able to add the CMS and when i am trying to adding it active monitor and reports manager cant able to find databases and DB's in test connection stage.

Can some one please respond ASAP and it would be greatfull.

 

Thanks,

Vinod

 

 

1 ACCEPTED SOLUTION

avatar
Rising Star

Hello Team,

 

Sorry for late response, I have created external metastore using MySql and make them up. Now my cluster is looking good.

 

Thanks Team for your efforts .

 

Best Regards,

Vinod

View solution in original post

16 REPLIES 16

avatar
Master Mentor

@kvinod 

In such a case, you should attach the logs because that's the only source we can use to investigate and get tips of maybe what happened.

How long have you cluster been running before going down? Did you ever purge the CMS database?  On large deployments, it's a good strategy to put the monitor roles on their own hosts and isolation.

Now that you have deleted the CMS hence lost the data therein you can safely create a new backend database and point the new config to that instance.

DB's to recreate

  • Reports Manager 
  • Activity Monitor

Please revert

 

avatar
Rising Star

Hi Shelton,

 

Thanks for responding.

Actually after deleting the CMS i am not finding the logs to attach, And as you suggested i have already created and trying to test the connections using those credentials.

But i am getting error and you can see it in screen shot.

The same server db also runningCMS_Error_wizard.PNG

Can i install my sql and create a DB and user and can i update them here ?

Was it work for me ?

Please suggest me with your valuable comments.

 

Best Regards,

Vinod

 

 

avatar
Master Guru

@kvinod,

 

The "No database server found running" indicates that your postgres server is either not running on the host you specified or is running on another host.  If you don't know where your database server is or it is missing, make sure you know if you installed and configured your own database or if you are using the "embedded" database from Cloudera that is intended for Proof of Concept installs.

 

If you need to set up your databases again, you can use this:

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_extrnl_pstgrs.html

 

After preparing the database, you can retry the install.

 

NOTE:  If you are *not* using MapReduce1 then you do not need Activity Monitor as it does not capture MR2/YARN.  If you are only using YARN for your Map Reduce, then this is a good time to get rid of Activity Monitor.

 

 

avatar
Rising Star

Hi Bgooly,

 

Thanks for responding..!!

Actually the same server i can see the status of embedded db,

 

$ sudo service cloudera-scm-server-db status
pg_ctl: server is running (PID: 8292)
/usr/bin/postgres "-D" "/var/lib/cloudera-scm-server-db/data"

 

$ sudo service cloudera-scm-server status
cloudera-scm-server (pid 8475) is running...
$ sudo service cloudera-scm-agent status
cloudera-scm-agent (pid 8407) is running...

 

We are using embedded DB only and the cluster we are using for testing purpose and there are some data processing was happening.

So in this case how i can install the external DB of postgresql where already postgress is installed ?

 

Is there any way the cloudera manager to recognize the DB for that step ?

Is there any config file we have for that to modify or anything ?

 

Can you please help me on this issue.

And you can find my present cluster health and i mentioned there the arrow mark, from there i am trying to add the cloudera management services.

My Cluster Health.PNG

 

Best Regards,

Vinod

avatar
Master Guru

@kvinod,

 

Thank you for all that information.  It really helps us understand the problem.

When you are specifying the database host, I note in your image that you specified just the host which means it will default to using the default postgres port (5432).  The port for the embedded db is "7432".

 

Try specifying your database host with "host:7432" (including the port) to see if that helps.

 

Since you are using the embedded database, the usernames and passwords will be created for you.

To use the existing embedded database, you will need to use the generated username and passwords that are stored for reference in the following file:


/etc/cloudera-scm-server/db.mgmt.properties

You can use the username and password specified for each database.

Based on what you mentioned, the above are likely to help.

avatar
Rising Star

Hi Bgooley,

 

Thanks a lot for quick responce,

Find below details,

 

 

 


$ netstat -na | grep 7432
tcp 0 0 0.0.0.0:7432 0.0.0.0:* LISTEN
tcp 0 0 hostname:7432 hostname:51538 ESTABLISHED
tcp 0 0 hostname:44734 hostname:7432 ESTABLISHED
tcp 0 0 hostname:7432 hostname:51786 ESTABLISHED


]$ ps -ef | grep postgres
postgres 8253 1 0 Aug27 ? 00:00:00 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
498 8292 1 0 Aug27 ? 00:00:02 /usr/bin/postgres -D /var/lib/cloudera-scm-server-db/data
postgres 8293 8253 0 Aug27 ? 00:00:09 postgres: logger process
postgres 8295 8253 0 Aug27 ? 00:00:39 postgres: writer process
postgres 8296 8253 0 Aug27 ? 00:00:38 postgres: wal writer process
postgres 8297 8253 0 Aug27 ? 00:00:09 postgres: autovacuum launcher process
postgres 8298 8253 0 Aug27 ? 00:00:05 postgres: stats collector process
498 8299 8292 0 Aug27 ? 00:00:09 postgres: logger process
498 8302 8292 0 Aug27 ? 00:00:44 postgres: writer process
498 8303 8292 0 Aug27 ? 00:00:37 postgres: wal writer process
498 8304 8292 0 Aug27 ? 00:00:11 postgres: autovacuum launcher process
498 8305 8292 0 Aug27 ? 00:00:24 postgres: stats collector process
498 8338 8292 0 12:27 ? 00:00:00 postgres: hive2 hive2 hostname(44730) idle
498 8339 8292 0 12:27 ? 00:00:00 postgres: hive2 hive2 hostname(44732) idle
498 8340 8292 0 12:27 ? 00:00:00 postgres: hive2 hive2 hostname(44734) idle
498 8341 8292 0 12:27 ? 00:00:00 postgres: hive2 hive2 hostname(44736) idle
498 9357 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51536) idle
498 9358 8292 0 Aug27 ? 00:00:01 postgres: scm scm hostname(51539) idle
498 9360 8292 0 Aug27 ? 00:01:02 postgres: scm scm hostname(51538) idle
498 9429 8292 0 Aug27 ? 00:02:11 postgres: scm scm hostname(51542) idle
498 9430 8292 0 Aug27 ? 00:01:49 postgres: scm scm hostname(51544) idle
498 12916 8292 0 Aug27 ? 00:00:08 postgres: scm scm hostname(51742) idle
498 12917 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51744) idle
498 12918 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51746) idle
498 13248 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51750) idle
498 13249 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51752) idle
498 13250 8292 0 Aug27 ? 00:01:41 postgres: scm scm hostname(51754) idle
498 13254 8292 0 Aug27 ? 00:01:26 postgres: scm scm hostname(51756) idle
498 13255 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51758) idle
498 13256 8292 0 Aug27 ? 00:01:08 postgres: scm scm hostname(51760) idle
498 13260 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51762) idle
498 13261 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51764) idle
498 13262 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51766) idle
498 13266 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51768) idle
498 13267 8292 0 Aug27 ? 00:00:56 postgres: scm scm hostname(51770) idle
498 13268 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51772) idle
498 13272 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51774) idle
498 13273 8292 0 Aug27 ? 00:00:29 postgres: scm scm hostname(51776) idle
498 13274 8292 0 Aug27 ? 00:01:25 postgres: scm scm hostname(51778) idle
498 13278 8292 0 Aug27 ? 00:02:40 postgres: scm scm hostname(51782) idle
498 13279 8292 0 Aug27 ? 00:00:36 postgres: scm scm hostname(51784) idle
498 13280 8292 0 Aug27 ? 00:02:13 postgres: scm scm hostname(51786) idle
498 13856 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51788) idle
498 13859 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51790) idle
498 13860 8292 0 Aug27 ? 00:00:00 postgres: scm scm hostname(51792) idle
mcaf 19041 17708 0 12:35 pts/1 00:00:00 grep postgres

 

 

$ sudo vi /etc/cloudera-scm-server/db.mgmt.properties
# The source of truth for these settings
# is the Cloudera Manager databases and
# changes made here will not be reflected
# there automatically.
#
com.cloudera.cmf.ACTIVITYMONITOR.db.type=postgresql
com.cloudera.cmf.ACTIVITYMONITOR.db.host=hostname:7432
com.cloudera.cmf.ACTIVITYMONITOR.db.name=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.user=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.password=4WB4R5yxnp
com.cloudera.cmf.REPORTSMANAGER.db.type=postgresql
com.cloudera.cmf.REPORTSMANAGER.db.host=hostname:7432
com.cloudera.cmf.REPORTSMANAGER.db.name=rman
com.cloudera.cmf.REPORTSMANAGER.db.user=rman
com.cloudera.cmf.REPORTSMANAGER.db.password=WceGeruLNG
com.cloudera.cmf.NAVIGATOR.db.type=postgresql
com.cloudera.cmf.NAVIGATOR.db.host=hostname:7432
com.cloudera.cmf.NAVIGATOR.db.name=nav
com.cloudera.cmf.NAVIGATOR.db.user=nav
com.cloudera.cmf.NAVIGATOR.db.password=D2tjw5xjoE
com.cloudera.cmf.ACTIVITYMONITOR.db.type=postgresql
com.cloudera.cmf.ACTIVITYMONITOR.db.host=hostname:7432
com.cloudera.cmf.ACTIVITYMONITOR.db.name=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.user=amon
com.cloudera.cmf.ACTIVITYMONITOR.db.password=O31A60K5SN
com.cloudera.cmf.REPORTSMANAGER.db.type=postgresql
com.cloudera.cmf.REPORTSMANAGER.db.host=hostname:7432
com.cloudera.cmf.REPORTSMANAGER.db.name=rman
com.cloudera.cmf.REPORTSMANAGER.db.user=rman
com.cloudera.cmf.REPORTSMANAGER.db.password=BPPShP0O9k
com.cloudera.cmf.NAVIGATOR.db.type=postgresql
com.cloudera.cmf.NAVIGATOR.db.host=hostname:7432
com.cloudera.cmf.NAVIGATOR.db.name=nav
com.cloudera.cmf.NAVIGATOR.db.user=nav
com.cloudera.cmf.NAVIGATOR.db.password=QHYL7zUSQe
com.cloudera.cmf.NAVIGATORMETASERVER.db.type=postgresql
com.cloudera.cmf.NAVIGATORMETASERVER.db.host=hostname:7432
com.cloudera.cmf.NAVIGATORMETASERVER.db.name=navms
com.cloudera.cmf.NAVIGATORMETASERVER.db.user=navms
com.cloudera.cmf.NAVIGATORMETASERVER.db.password=elJRINTAth


My db.mgmt properties file is looking like this and i can see two different activity monitor entries and two for reports manager entries.

 

And i have already tried using these user name and passwords while adding CMS.

Please update me with your valuable comments.

 

 

Best Regards,

Vinod

 

 

avatar
Rising Star

Hello Bgooley,

 

Can you please give me some inputs to over come this issue.

 

Thanks,

Vinod

avatar
Rising Star

Team,

 

Any solutions will be appreciated.

Still am facing the same issue and can some one help me on this.

 

Thanks,

Vinod

avatar
Master Guru

@kvinod,

 

Since Reports Manager and Activity Monitor are not as essential as Host Monitor, Service Monitor, Events Server, and Alerts Publisher, I recommend adding the Management Service with only those for now. 

 

They don't use a SQL database, so you should be able to get your CM and Management Service looking good again.  You can then add the Activity Monitor and Reports Manager later without as much stress.

 

REGARDING THE PROBLEM:

I'd like to point out that the error you got in the wizard (when you took the screen shot) showed that the host/port combination provided in the Add Service wizard could not be contacted.  The "No database server found running on host ___" error is thrown if a CONNECTION REFUSED is returned by the attempt to connect.  This generally means that there was nothing listening on the requested host and port.

 

Since we found no port number (7432) specified in the database host string (in the Add Service wizard) the default of 5432 would be used.  NOTE that you have 2 postgres servers running on the host where you did the netstat:

  • Cloudera Embedded (listening on 7432)
  • Another postgres server (listening on 5432)

If that is the case and the "test connection" could connect to neither, that makes me curious whether the hostname specified in the wizard is the one where the postgres servers are listening.

 

We understand that you are under some stress, but it is very difficult to understand what is going wrong with just the information we have.  It might be useful to take a look at the stderr.log and stdout.log for the test connection command.

On the host where you are attempting to install the Reports Manager and Activity Monitor, you can look in the /var/run/cloudera-scm-agent/process directory to find the db test command process directory.  It sill look like this for instance: 2839-MGMT.ACTIVITYMONITOR-test-db-connection.

 

Inside it, you will find a logs directory containing stderr.log... you might go over that file and see if there are any other clues.

 

Also, you could use psql command to test the connection yourself from the host where you are trying add the Reports Manager and Activity Monitor.