Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

After hard restart cant start any services from manager

After hard restart cant start any services from manager

Explorer

I have an issue with all of our services, we did a firmware upgrade for our machine and now all of the services say:

 

"No host heartbeat; CDH versions cannot be verified."

 

and I cannot restart the services through the manager they will either Time-out or mention cannot communicate with the name node:

 

"Command aborted because of exception: Command timed-out after 150 seconds"

here is the log for our cluster, just a heads up the cluster is one machine and the data nodes are vm's. 

 

The log file is too big to post please observe here:
http://pastebin.com/CznqykHF

6 REPLIES 6

Re: After hard restart cant start any services from manager

Explorer
Sorry, so my steps included running ifconfig on each node and testing to see if they
Ping each other (They can)
Are still the same pre-reboot (they are)

Then I checked the hosts file to ensure each node was resolving to the FQDN and ip address

regarding the firewalls

I checked SELinux and it was off on all nodes
I checked IPTables and again they were still off.

What recommendations would you take after these steps?

Re: After hard restart cant start any services from manager

Expert Contributor

Hi Charles, thanks for the cloudera-scm-server.log, it's most helpful. It reads like the other side of the heartbeat communication - the service cloudera-scm-agent - may not yet be running or failed to start for some reason on 'usorla7hp1106x'. Could you pastebin /var/log/cloudera-scm-agent/cloudera-scm-agent.log and /var/log/cloudera-scm-agent/supervisord.log? Redact any IP's, hostnames or ID's you may feel necessary, and the last 500-1000 lines may be sufficient from either of them.

 

Also, what's the output of

# service cloudera-scm-agent status

# ps -ef | grep supervisord

 

Thanks

--

Highlighted

Re: After hard restart cant start any services from manager

Explorer

 Thanks for the quick reply

 

(/var/log/hadoop-yarn)-1179> service cloudera-scm-agent status
cloudera-scm-agent (pid 59385) is running...

 

g/hadoop-yarn)-1180> ps -ef | grep supervisord
root 59419 1 0 12:00 ? 00:00:00 /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/python /usr/lib64/cmf/agent/src/cmf/../../build/env/bin/supervisord
root 65096 37187 0 12:48 pts/3 00:00:00 grep supervisord

 

 Names have been replaced by CORRECT-FQDN-HERE.net

Here is the log file you requested for cloudera-scm-agent.log

http://pastebin.com/Uma4A4zD

 

Here is the log file for cloudera supervisord.log

http://pastebin.com/PuV9dSbV

Re: After hard restart cant start any services from manager

Expert Contributor

Great. 
 
This appears to be the crux:

 

12/Aug/2014 12:52:03 +0000 59385 MainThread agent        ERROR    Heartbeating to CORRECT-FQDN-HERE.net:7182 failed.
 
Something is preventing the agent from properly heartbeating to port 7182 on the node where cloudera-scm-server runs. You said this is a single-node cluster though, right? as in this cloudera-scm-agent is running locally along with cloudera-scm-server on CORRECT-FQDN-HERE.net?
 
You've already spoken of iptables, selinux - anything that may have changed with the reboot like the FQDN? Have you made any changes to /etc/cloudera-scm-agent/config.ini before/during/after this reboot?

Re: After hard restart cant start any services from manager

Explorer

This is a single machine; however, I am running three vm's (KVM) that act as datanodes and the "name node" is just sitting on the main machine. Question: am I suppose to run the agent seperatly from the cloudera-scm-server?

 

I have recieved little information from my department regarding the firmware upgrade (Let alone a notice lol!) 

 

hostname, /etc/host, ifconfig and host -v -t A hostname 

all match up, so nothing has changed...

 

That file you mentioned still looks the same from when I left it

 

I basically went home on a friday with everything operational and come back on a monday to see the manager FUBAR'd is there anything I can do to manually start these services perhaps the cloudera manager is not communicating correctly to these services?  

 

EDIT: also is there something that is the agent needs to run successfully? maybe another service running?

 

Re: After hard restart cant start any services from manager

Can you double check the following please:

- /etc/cloudera-scm-agent/config.ini should have the hostname or IP address of the machine where Cloudera Manager runs

- if you see a host name above, ensure you can resolve it correctly from the slaves

 

# ping CORRECT-FQDN-HERE.net

 

# telnet CORRECT-FQDN-HERE.net 7182

 

The "host" command won't consult /etc/hosts, so need to use ping or somthing simple that just calls gethostbyname()

Regards,
Gautam Gopalakrishnan