09-03-2013 01:50 PM
I just finished my install and have many bad health statuses and cannot determine why.
The health of the Activity Monitor is bad. The following health checks are bad: host health.
The health of the Service Monitor is bad. The following health checks are bad: host health.
The health of the Host Monitor is bad. The following health checks are bad: host health.
The health of the Event Server is bad. The following health checks are bad: host health.
The health of the Alert Publisher is bad. The following health checks are bad: host health.
The health of the Reports Manager is bad. The following health checks are bad: host health.
I'm not sure which log to look into for more information on this, but I don't see anything troubling in the cloudera-scm-server/cloudera-scm-server.log log.
Are all of these messages because I haven't configured/started to use hadoop yet? Or is something else not configured? I looked in the manual for help, but didn't see anything. Any help would be much appreciated.
09-04-2013 10:02 AM
If you click on the name of one of the services which is reporting bad health (hdfs1, for example), on that next screen, you'll see a list of nodes in good health or bad health. Click on the bad ones and a window should come up explaining what CM health check is failing. From the ones you posted in your comment, it sounds like there is a network communication problem between the CM server and the other hosts. Can you ping them from the CM server by their hostnames? You can also run the "Host Inspector" tool from the Hosts tab in CM to have it double check network connectivity, etc.
09-04-2013 11:44 AM
Thank you for your response. This is a lab instance so I have everything installed on the same machine, so communication should not be an issue. It does look like the biggest issue that they're all highlighting is DNS Resolution. Since this is a lab machine, we do not have DNS turned on, is that going to be an issue. Is there somewhere where I can turn that off? The other concern that they all had in the detailed pages was swapping. I've installed more on this VM than I originally planned so I'm going to ask the admin to add some more RAM for me and see if that solves that issue. Please let me know if you have any other recommendations.
09-04-2013 01:11 PM
Thanks Meredith, that does help me understand what's going on better.
You will need to be able to resolve your own hostname even if everything's running on a local machine. You can use /etc/hosts in combination with the HOSTNAME=<your hostname> property inside the /etc/sysconfig/network file to accomplish name resolution. Also make sure that the "hosts" line of /etc/nsswitch.conf has a value of "files" listed before "dns".
What does the "hostname" command return on this machine? You should have that hostname listed in /etc/hosts next to the actual IP address of the machine (possibly the eth0 interface?), rather than the loopback address (127.0.0.1). Sometimes VMs will end up with both "localhost" and their actual hostname listed on the loopback line of /etc/hosts and that might confuse the name resolution process.
you can test what address is resolved by your hostname by issuing this command:
NOTE: those are backticks, not regular single quotes.
09-05-2013 08:09 AM
Thank you again. The ping `hostname` does return the FQDN and the real IP and files is listed before dns in the /etc/nsswitch.conf.
Are there other things that I can check/configure to get my install healthy?