Reply
MW
New Contributor
Posts: 8
Registered: ‎08-29-2013

Bad Health Statuses after install

I just finished my install and have many bad health statuses and cannot determine why.  

  • Zookeeper's health is bad and I do not see anything too troubleing in the logs.  I do see the folowing message for 2 different ports, but plenty of healthy connections: "EndOfStreamException: Unable to read additional data from client sessionid 0x140e567690c0014, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:662)"

 

  • Both hdfs and mapreduce are in bad health.  I looked through all of the logs associated with both and the only exceptions that I saw anywhere were ones counting down time until safe mode was to be turned off.

 

  • My mgmt1 had a Bad Health and inside it I have the following information: 6 bad.

The health of the Activity Monitor is bad. The following health checks are bad: host health.

The health of the Service Monitor is bad. The following health checks are bad: host health.
The health of the Host Monitor is bad. The following health checks are bad: host health.
The health of the Event Server is bad. The following health checks are bad: host health.
The health of the Alert Publisher is bad. The following health checks are bad: host health.
The health of the Reports Manager is bad. The following health checks are bad: host health.

I'm not sure which log to look into for more information on this, but I don't see anything troubling in the cloudera-scm-server/cloudera-scm-server.log log.

 

 

Are all of these messages because I haven't configured/started to use hadoop yet?  Or is something else not configured?  I looked in the manual for help, but didn't see anything.  Any help would be much appreciated.

 

Thank you,

Meredith

Posts: 416
Topics: 51
Kudos: 83
Solutions: 49
Registered: ‎06-26-2013

Re: Bad Health Statuses after install

Meredith,

 

  If you click on the name of one of the services which is reporting bad health (hdfs1, for example), on that next screen, you'll see a list of nodes in good health or bad health.  Click on the bad ones and a window should come up explaining what CM health check is failing.  From the ones you posted in your comment, it sounds like there is a network communication problem between the CM server and the other hosts.  Can you ping them from the CM server by their hostnames?  You can also run the "Host Inspector" tool from the Hosts tab in CM to have it double check network connectivity, etc.

 

Clint

MW
New Contributor
Posts: 8
Registered: ‎08-29-2013

Re: Bad Health Statuses after install

Clint,

Thank you for your response.  This is a lab instance so I have everything installed on the same machine, so communication should not be an issue.  It does look like the biggest issue that they're all highlighting is DNS Resolution.  Since this is a lab machine, we do not have DNS turned on, is that going to be an issue.  Is there somewhere where I can turn that off?  The other concern that they all had in the detailed pages was swapping.  I've installed more on this VM than I originally planned so I'm going to ask the admin to add some more RAM for me and see if that solves that issue.  Please let me know if you have any other recommendations.

 

Thank you,

Meredith

Posts: 416
Topics: 51
Kudos: 83
Solutions: 49
Registered: ‎06-26-2013

Re: Bad Health Statuses after install

Thanks Meredith, that does help me understand what's going on better.

 

You will need to be able to resolve your own hostname even if everything's running on a local machine.  You can use /etc/hosts in combination with the HOSTNAME=<your hostname> property inside the /etc/sysconfig/network file to accomplish name resolution.  Also make sure that the "hosts" line of /etc/nsswitch.conf has a value of "files" listed before "dns".

 

What does the "hostname" command return on this machine?  You should have that hostname listed in /etc/hosts next to the actual IP address of the machine (possibly the eth0 interface?), rather than the loopback address (127.0.0.1).  Sometimes VMs will end up with both "localhost" and their actual hostname listed on the loopback line of /etc/hosts and that might confuse the name resolution process.

 

you can test what address is resolved by your hostname by issuing this command:

 

ping `hostname`

 

NOTE: those are backticks, not regular single quotes.

 

HTH

MW
New Contributor
Posts: 8
Registered: ‎08-29-2013

Re: Bad Health Statuses after install

Thank you again.  The ping `hostname` does return the FQDN and the real IP and files is listed before dns in the /etc/nsswitch.conf.  

 

Are there other things that I can check/configure to get my install healthy?

Posts: 416
Topics: 51
Kudos: 83
Solutions: 49
Registered: ‎06-26-2013

Re: Bad Health Statuses after install

What did the Host Inspector report say?  It will let you drill into the various configs that will be problematic and usually gets pretty specific.

MW
New Contributor
Posts: 8
Registered: ‎08-29-2013

Re: Bad Health Statuses after install

The host inspector report was all green and checkmarks, no issues reported...

Announcements