Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can't add management service because the cluster refuses to test the external database

avatar
Explorer

Hi all!  I'm rebuilding my sandbox cluster to use an external mysql database, and I think I'm following Cloudera's step-by-step instructions.  I created the various databases, including one for the management service itself, on my mysql server outside the cluster.  Then I wiped out all the server and agent software in my cluster so I could do a fresh yum install (I'm using CentOS).  On the manager server, I reinstalled the cloudera-manager-server software, and then used the scm_prepare_database.sh script to set up the connection the external db; I got "Success".

 

Then I fired up the cluster-scm-server, waited for it to come fully online, logged into the web UI, and was prompted to go through the usual steps.  It successfully installed the agents on all 4 of my CentOS nodes, but then when it tried to distribute the parcels, it complained that all the hosts had bad health.  I clicked out to the main desktop page to look at the hosts, and sure enough, all of them have "unknown" health.  I assume that's because there's no management service set up yet, so I go to do that, but it refuses to test the connection to the database because my manager server's health is bad:

 

Unable to test database connection for host not in good state.

 

When I check the server log, I get more or less the same message:

 

2017-03-27 15:14:04,466 INFO 266434700@scm-web-16:com.cloudera.cmf.model.DbCommand: Command null(RepMgrTestDatabaseConnection) has completed. finalstate:FINISHED, success:false, msg:Unable to test database connection for host not in good state.

 

I don't think it's the database connection because I can connect from my manager server using the mysql command line client with my user and password.  Seems like it won't build the management service because it can't test the database; it can't test the database because the manager server has "bad" health; all the hosts have "unknown" health because there's no management service tracking them.  Argh!  Any idea how I break this circle?  I don't think I screwed up the initial SCM database set up because when I connect to the database I see a bunch of tables in that db that must've been created by Cloudera, since I didn't do it.

 

Other details:
These are all VMWare VMs, running CentOS 6.8.  I'm attempting to install CMS 5.10 and CDH 5.10.

1 ACCEPTED SOLUTION

avatar
Explorer

Problem solved!  You pointed me in the right direction.  A check of the agent log showed this error:

 

[28/Mar/2017 11:28:09 +0000] 7731 MainThread agent        ERROR    Error, CM server guid updated, expected 240da00c-05c4-4053-b8a1-5ba957dfab5f, received 46d4b8a7-c2ac-4eae-8ce6-758d94046a26

 

When I googled it, it said I should wipe out /var/lib/cloudera-scm-agent/cm_guid.  Did that, and now things seem to be working fine. Thanks!

View solution in original post

3 REPLIES 3

avatar
Guru

Hello,

Thanks for reaching out to the community.

 

After reading the scenario, I think it is quite possible that on the sandbox there may have some left over things not cleaned up before a reinstall. That may cause all the agent nodes not heartbeat to CM server successfully. By default, agent sends a heartbeat to CM server to report its health. Right now, this is broken at your cluster.

 

You are not in a "cirle" since the Cloudera Management Service has nothing to do with this issue. Once you resolve the agent not able to talk to CM server issue, you should be able to install Cloudera Management Service easily.

 

The question right now is how to resolve the agent issue? In order to find out what went wrong, we need to look into:
1) CM agent log which is located on the agent host. By default, the path is /var/log/cloudera-scm-agent/cloudera-scm-agent.log
2) CM server log which is located on the CM host. By default, the path is /var/log/cloudera-scm-server/cloudera-scm-server.log

 

Please post above information so we can take into this issue together.

 

Thanks,
Li
Cloudera support

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Explorer

Problem solved!  You pointed me in the right direction.  A check of the agent log showed this error:

 

[28/Mar/2017 11:28:09 +0000] 7731 MainThread agent        ERROR    Error, CM server guid updated, expected 240da00c-05c4-4053-b8a1-5ba957dfab5f, received 46d4b8a7-c2ac-4eae-8ce6-758d94046a26

 

When I googled it, it said I should wipe out /var/lib/cloudera-scm-agent/cm_guid.  Did that, and now things seem to be working fine. Thanks!

avatar
Guru

You are very welcome! Glad to hear the issue got resolved.

 

Cheers,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum