Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cloudera Manager 5.3.2 can't start roles on one of the hosts

Highlighted

Cloudera Manager 5.3.2 can't start roles on one of the hosts

Expert Contributor

Hi, did HBase service restart and on of RS on node didn't start.

Here are logs from agent:

 

 

[24/May/2015 16:12:34 +0000] 7684 MainThread util         INFO     Extracted 13 files and 0 dirs to /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER.
[24/May/2015 16:12:34 +0000] 7684 MainThread agent        INFO     Created /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER/logs
[24/May/2015 16:12:34 +0000] 7684 MainThread agent        INFO     Chowning /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER/logs to hbase (995) hbase (995)
[24/May/2015 16:12:34 +0000] 7684 MainThread agent        INFO     Chmod'ing /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER/logs to 0751
[24/May/2015 16:12:34 +0000] 7684 MainThread cgroups      INFO     Creating cgroup /run/cloudera-scm-agent/cgroups/blkio/2688-hbase-REGIONSERVER
[24/May/2015 16:12:34 +0000] 7684 MainThread cgroups      INFO     Creating cgroup /run/cloudera-scm-agent/cgroups/cpuacct/2688-hbase-REGIONSERVER

 

 

Catalog 

/run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER

is empty for unknown reason

 

Here are logs from cloudera-manager-server:

2015-05-24 17:12:34,006 INFO 2122353975@scm-web-12491:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing role command Restart BasicCmdArgs{args=[]}.  Service: DbService{id=26, name=hbase} Role: DbRole{id=286, name=hbase-REGIONSERVER-f125dcf94ca5890bacabcee567cf9072, hostName=host02.domain.ru}
2015-05-24 17:12:34,073 INFO 2122353975@scm-web-12491:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing role command Start BasicCmdArgs{args=[]}.  Service: DbService{id=26, name=hbase} Role: DbRole{id=286, name=hbase-REGIONSERVER-f125dcf94ca5890bacabcee567cf9072, hostName=host02.domain.ru}
2015-05-24 17:12:49,670 INFO 336630595@agentServer-15907:com.cloudera.cmf.command.components.StalenessChecker: No staleness check scheduled, scheduling one in 30 seconds
2015-05-24 17:13:19,671 INFO ScheduledStalenessChecker:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing command ProcessStalenessCheckCommand BasicCmdArgs{args=[First reason why: Process (id=2688) has a brand new heartbeat]}.
2015-05-24 17:13:26,107 INFO ProcessStalenessDetector-0:com.cloudera.cmf.service.config.components.ProcessStalenessDetector: Staleness check done. Duration: PT6.339S 
2015-05-24 17:15:06,592 INFO CommandPusher:com.cloudera.cmf.service.AbstractBringUpBringDownCommands: Aborting BringUp command (5241) on service DbService{id=26, name=hbase} role DbRole{id=286, name=hbase-REGIONSERVER-f125dcf94ca5890bacabcee567cf9072, hostName=host02.domain.ru}.
2015-05-24 17:15:19,769 WARN 336630595@agentServer-15907:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=2688 name=null host=5fcc9bbd-9c79-4db4-809d-5590b7427799/host02.domain.ru
2015-05-24 17:16:57,792 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2015-05-24 17:16:57,794 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2013-05-24T13:16:57.792Z to reap.
2015-05-24 17:16:57,795 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2015-05-24 17:16:57,795 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.

There are more running roles on that node and all their process folders are empty.

What is could be?

5 REPLIES 5
Highlighted

Re: Cloudera Manager 5.3.2 can't start roles on one of the hosts

Expert Contributor
Node reboot helped. No Idea what is was. We did try to reboot cloudera-scm-agent, no luck. Only hard-reset. I feel myself like windows-admin...
Highlighted

Re: Cloudera Manager 5.3.2 can't start roles on one of the hosts

Community Manager

I am happy to see you got this sorted out. Thank you for posting your solution as well. :)


Cy Jervis, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:
Community Guidelines
How to use the forum
Highlighted

Re: Cloudera Manager 5.3.2 can't start roles on one of the hosts

Expert Contributor
No problem, still confused with the solution. We can't just reboot the node each time something weird happens :)
Highlighted

Re: Cloudera Manager 5.3.2 can't start roles on one of the hosts

The process directory is within /var/run/cloudera-scm-agent/process which
is a in-memory filesystem. You can recreate it by rebooting the box or by
restarting the agent with the clean_restart option, which will restart all
CDH roles on the host as well.

# service cloudera-scm-agent clean_restart

If you face this error again, please check the filesystem's health (disk
free, is it writable etc)

Regards,
Gautam Gopalakrishnan

Re: Cloudera Manager 5.3.2 can't start roles on one of the hosts

Expert Contributor
service cloudera-scm-agent clean_restart

will try it next time

 

service cloudera-scm-agent restart

didn't help

 

>If you face this error again, please check the filesystem's health

Did it. Copy-pasted commands from cloudera-scm-agent log and did them manually, it works.

FS was ok, enough free space + write access enabled

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here