Created 05-24-2015 06:32 AM
Hi, did HBase service restart and on of RS on node didn't start.
Here are logs from agent:
[24/May/2015 16:12:34 +0000] 7684 MainThread util INFO Extracted 13 files and 0 dirs to /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER. [24/May/2015 16:12:34 +0000] 7684 MainThread agent INFO Created /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER/logs [24/May/2015 16:12:34 +0000] 7684 MainThread agent INFO Chowning /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER/logs to hbase (995) hbase (995) [24/May/2015 16:12:34 +0000] 7684 MainThread agent INFO Chmod'ing /run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER/logs to 0751 [24/May/2015 16:12:34 +0000] 7684 MainThread cgroups INFO Creating cgroup /run/cloudera-scm-agent/cgroups/blkio/2688-hbase-REGIONSERVER [24/May/2015 16:12:34 +0000] 7684 MainThread cgroups INFO Creating cgroup /run/cloudera-scm-agent/cgroups/cpuacct/2688-hbase-REGIONSERVER
Catalog
/run/cloudera-scm-agent/process/2688-hbase-REGIONSERVER
is empty for unknown reason
Here are logs from cloudera-manager-server:
2015-05-24 17:12:34,006 INFO 2122353975@scm-web-12491:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing role command Restart BasicCmdArgs{args=[]}. Service: DbService{id=26, name=hbase} Role: DbRole{id=286, name=hbase-REGIONSERVER-f125dcf94ca5890bacabcee567cf9072, hostName=host02.domain.ru} 2015-05-24 17:12:34,073 INFO 2122353975@scm-web-12491:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing role command Start BasicCmdArgs{args=[]}. Service: DbService{id=26, name=hbase} Role: DbRole{id=286, name=hbase-REGIONSERVER-f125dcf94ca5890bacabcee567cf9072, hostName=host02.domain.ru} 2015-05-24 17:12:49,670 INFO 336630595@agentServer-15907:com.cloudera.cmf.command.components.StalenessChecker: No staleness check scheduled, scheduling one in 30 seconds 2015-05-24 17:13:19,671 INFO ScheduledStalenessChecker:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing command ProcessStalenessCheckCommand BasicCmdArgs{args=[First reason why: Process (id=2688) has a brand new heartbeat]}. 2015-05-24 17:13:26,107 INFO ProcessStalenessDetector-0:com.cloudera.cmf.service.config.components.ProcessStalenessDetector: Staleness check done. Duration: PT6.339S 2015-05-24 17:15:06,592 INFO CommandPusher:com.cloudera.cmf.service.AbstractBringUpBringDownCommands: Aborting BringUp command (5241) on service DbService{id=26, name=hbase} role DbRole{id=286, name=hbase-REGIONSERVER-f125dcf94ca5890bacabcee567cf9072, hostName=host02.domain.ru}. 2015-05-24 17:15:19,769 WARN 336630595@agentServer-15907:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=2688 name=null host=5fcc9bbd-9c79-4db4-809d-5590b7427799/host02.domain.ru 2015-05-24 17:16:57,792 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands 2015-05-24 17:16:57,794 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2013-05-24T13:16:57.792Z to reap. 2015-05-24 17:16:57,795 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests. 2015-05-24 17:16:57,795 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
There are more running roles on that node and all their process folders are empty.
What is could be?
Created 05-24-2015 12:41 PM
Created 05-25-2015 06:35 AM
I am happy to see you got this sorted out. Thank you for posting your solution as well. :)
Created 05-25-2015 07:32 AM
Created 05-26-2015 12:01 AM
Created 05-26-2015 01:48 AM
service cloudera-scm-agent clean_restart
will try it next time
service cloudera-scm-agent restart
didn't help
>If you face this error again, please check the filesystem's health
Did it. Copy-pasted commands from cloudera-scm-agent log and did them manually, it works.
FS was ok, enough free space + write access enabled