Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase Master: Failed to become active master

avatar
Contributor

HBase Master cannot start up!!

Here is the error (trace):

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2015-05-08 09:31:02,675 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/native
2015-05-08 09:31:02,675 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2015-05-08 09:31:02,675 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2015-05-08 09:31:02,675 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux
2015-05-08 09:31:02,675 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64
2015-05-08 09:31:02,675 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
2015-05-08 09:31:02,676 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=hbase
2015-05-08 09:31:02,676 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/var/lib/hbase
2015-05-08 09:31:02,676 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/var/run/cloudera-scm-agent/process/1831-hbase-MASTER
2015-05-08 09:31:02,676 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181,master:2181,node2:2181 sessionTimeout=60000 watcher=master:600000x0, quorum=node1:2181,master:2181,node2:2181, baseZNode=/hbase
2015-05-08 09:31:02,687 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node1/10.15.230.41:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-08 09:31:02,690 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to node1/10.15.230.41:2181, initiating session
2015-05-08 09:31:02,695 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server node1/10.15.230.41:2181, sessionid = 0x24d3137ed890570, negotiated timeout = 60000
2015-05-08 09:31:02,729 INFO org.apache.hadoop.hbase.ipc.RpcServer: RpcServer.responder: starting
2015-05-08 09:31:02,729 INFO org.apache.hadoop.hbase.ipc.RpcServer: RpcServer.listener,port=60000: starting
2015-05-08 09:31:02,776 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-05-08 09:31:02,780 INFO org.apache.hadoop.hbase.http.HttpRequestLog: Http request log for http.requests.master is not defined
2015-05-08 09:31:02,788 INFO org.apache.hadoop.hbase.http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)
2015-05-08 09:31:02,790 INFO org.apache.hadoop.hbase.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context master
2015-05-08 09:31:02,790 INFO org.apache.hadoop.hbase.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2015-05-08 09:31:02,791 INFO org.apache.hadoop.hbase.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2015-05-08 09:31:02,803 INFO org.apache.hadoop.hbase.http.HttpServer: Jetty bound to port 60010
2015-05-08 09:31:02,803 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2015-05-08 09:31:03,061 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:60010
2015-05-08 09:31:03,063 INFO org.apache.hadoop.hbase.master.HMaster: hbase.rootdir=hdfs://master:8020/hbase, hbase.cluster.distributed=true
2015-05-08 09:31:03,073 INFO org.apache.hadoop.hbase.master.HMaster: Adding backup master ZNode /hbase/backup-masters/master,60000,1431091861873
2015-05-08 09:31:03,142 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/master,60000,1431091861873 from backup master directory
2015-05-08 09:31:03,149 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Registered Active Master=master,60000,1431091861873
2015-05-08 09:31:03,152 INFO org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-08 09:31:03,161 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x348d62d4 connecting to ZooKeeper ensemble=node1:2181,master:2181,node2:2181
2015-05-08 09:31:03,162 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181,master:2181,node2:2181 sessionTimeout=60000 watcher=hconnection-0x348d62d40x0, quorum=node1.net:2181,master:2181,node2:2181, baseZNode=/hbase
2015-05-08 09:31:03,162 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node2/10.15.230.42:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-08 09:31:03,163 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to node2/10.15.230.42:2181, initiating session
2015-05-08 09:31:03,164 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server node2/10.15.230.42:2181, sessionid = 0x34d3137ea45057a, negotiated timeout = 60000
2015-05-08 09:31:03,184 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: ClusterId : d72a7eb0-8dba-485b-a8bc-2fbf5a182ed7
2015-05-08 09:31:03,365 INFO org.apache.hadoop.hbase.fs.HFileSystem: Added intercepting call to namenode#getBlockLocations so can do block reordering using class class org.apache.hadoop.hbase.fs.HFileSystem$ReorderWALBlocks
2015-05-08 09:31:03,370 INFO org.apache.hadoop.hbase.coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes
2015-05-08 09:31:03,383 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7ae2a774 connecting to ZooKeeper ensemble=node1:2181,master:2181,node2:2181
2015-05-08 09:31:03,383 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181,master:2181,node2:2181 sessionTimeout=60000 watcher=hconnection-0x7ae2a7740x0, quorum=node1:2181,master:2181,node2:2181, baseZNode=/hbase
2015-05-08 09:31:03,385 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node1/10.15.230.41:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-08 09:31:03,385 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to node1/10.15.230.41:2181, initiating session
2015-05-08 09:31:03,386 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server node1/10.15.230.41:2181, sessionid = 0x24d3137ed890571, negotiated timeout = 60000
2015-05-08 09:31:03,398 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: loading config
2015-05-08 09:31:03,439 INFO org.apache.hadoop.hbase.master.HMaster: Server active/primary master=master,60000,1431091861873, sessionid=0x24d3137ed890570, setting cluster-up flag (Was=true)
2015-05-08 09:31:03,452 INFO org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/online-snapshot/acquired /hbase/online-snapshot/reached /hbase/online-snapshot/abort
2015-05-08 09:31:03,458 INFO org.apache.hadoop.hbase.procedure.ZKProcedureUtil: Clearing all procedure znodes: /hbase/flush-table-proc/acquired /hbase/flush-table-proc/reached /hbase/flush-table-proc/abort
2015-05-08 09:31:03,485 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=replicationLogCleaner connecting to ZooKeeper ensemble=node1:2181,master:2181,node2:2181
2015-05-08 09:31:03,485 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181,master:2181,node2:2181 sessionTimeout=60000 watcher=replicationLogCleaner0x0, quorum=node1:2181,master:2181,node2:2181, baseZNode=/hbase
2015-05-08 09:31:03,486 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server node1/10.15.230.41:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-08 09:31:03,486 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to node1/10.15.230.41:2181, initiating session
2015-05-08 09:31:03,487 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server node1/10.15.230.41:2181, sessionid = 0x24d3137ed890572, negotiated timeout = 60000
2015-05-08 09:31:03,495 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-05-08 09:31:04,998 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 1503 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-05-08 09:31:06,300 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=node2,60020,1431091828858
2015-05-08 09:31:06,300 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=node3,60020,1431091829281
2015-05-08 09:31:06,300 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=node5,60020,1431091828696
2015-05-08 09:31:06,301 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=node4,60020,1431091828684
2015-05-08 09:31:06,301 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=node1,60020,1431091828790
2015-05-08 09:31:06,301 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 4, slept for 2806 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-05-08 09:31:06,352 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 5, slept for 2857 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-05-08 09:31:07,855 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 5, slept for 4360 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.
2015-05-08 09:31:08,006 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of 2147483647, master is running
2015-05-08 09:31:08,012 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node1,60020,1431091828790 belongs to an existing region server
2015-05-08 09:31:08,013 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node2,60020,1431091828858 belongs to an existing region server
2015-05-08 09:31:08,015 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node3,60020,1431091829281 belongs to an existing region server
2015-05-08 09:31:08,016 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node4,60020,1431091828684 belongs to an existing region server
2015-05-08 09:31:08,017 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node5,60020,1431091828696 belongs to an existing region server
2015-05-08 09:31:08,081 INFO org.apache.hadoop.hbase.master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1431091868026, server=null} to {1588230740 state=OPEN, ts=1431091868081, server=node3,60020,1431091829281}
2015-05-08 09:31:08,083 INFO org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting
2015-05-08 09:31:08,084 INFO org.apache.hadoop.hbase.master.HMaster: hbase:meta assigned=0, rit=false, location=node3,60020,1431091829281
2015-05-08 09:31:08,128 INFO org.apache.hadoop.hbase.MetaMigrationConvertingToPB: hbase:meta doesn't have any entries to update.
2015-05-08 09:31:08,128 INFO org.apache.hadoop.hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization
2015-05-08 09:31:08,145 INFO org.apache.hadoop.hbase.master.AssignmentManager: Clean cluster startup. Assigning user regions
2015-05-08 09:31:08,150 INFO org.apache.hadoop.hbase.master.AssignmentManager: Joined the cluster in 22ms, failover=false
2015-05-08 09:31:08,161 INFO org.apache.hadoop.hbase.master.TableNamespaceManager: Namespace table not found. Creating...
2015-05-08 09:31:08,189 FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:152)
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:125)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:897)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1469)
        at java.lang.Thread.run(Thread.java:745)
2015-05-08 09:31:08,195 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2015-05-08 09:31:08,195 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

1 ACCEPTED SOLUTION

avatar
Rising Star

So ZK is working fine. Perfect.

 

Make sure HBase is down and clear everything:

hadoop fs -rm -r /hbase/*

echo "rmr /hbase" | zookeeper-client

 

Then try to start HBase.

 

JMS

View solution in original post

29 REPLIES 29

avatar
Contributor

I have restarted all services and agents but NO luck!

 

Here is the latest HBASE-MASTER log:

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2015-05-10 09:43:44,744 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 5, slept for 4529 ms, expecting minimum of 1, maximum of 2147483647, master is running
2015-05-10 09:43:44,752 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node1,60020,1431265418332 belongs to an existing region server
2015-05-10 09:43:44,754 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node2,60020,1431265418290 belongs to an existing region server
2015-05-10 09:43:44,756 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node3,60020,1431265418644 belongs to an existing region server
2015-05-10 09:43:44,758 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node4,60020,1431265418152 belongs to an existing region server
2015-05-10 09:43:44,759 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://master:8020/hbase/WALs/node5,60020,1431265418158 belongs to an existing region server
2015-05-10 09:43:44,834 INFO org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at address=node3,60020,1431091829281, exception=org.apache.hadoop.hbase.NotServingRegionException: Region hbase:meta,,1 is not online on node3,60020,1431265418644
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2791)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:886)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1168)
        at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20862)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)
2015-05-10 09:43:44,839 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [node3,60020,1431091829281]
2015-05-10 09:43:44,841 INFO org.apache.hadoop.hbase.master.SplitLogManager: hdfs://master:8020/hbase/WALs/node3,60020,1431091829281-splitting is empty dir, no logs to split
2015-05-10 09:43:44,842 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting 0 logs in [hdfs://master:8020/hbase/WALs/node3,60020,1431091829281-splitting] for [node3,60020,1431091829281]
2015-05-10 09:43:44,844 INFO org.apache.hadoop.hbase.master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://master:8020/hbase/WALs/node3,60020,1431091829281-splitting] in 2ms
2015-05-10 09:43:44,844 INFO org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Deleting hbase:meta region location in ZooKeeper
2015-05-10 09:43:44,865 INFO org.apache.hadoop.hbase.master.AssignmentManager: Assigning hbase:meta,,1.1588230740 to node2,60020,1431265418290
2015-05-10 09:43:44,865 INFO org.apache.hadoop.hbase.master.RegionStates: Transition {1588230740 state=OFFLINE, ts=1431265424859, server=null} to {1588230740 state=PENDING_OPEN, ts=1431265424865, server=node2,60020,1431265418290}
2015-05-10 09:43:44,907 INFO org.apache.hadoop.hbase.master.ServerManager: AssignmentManager hasn't finished failover cleanup; waiting
2015-05-10 09:43:44,918 INFO org.apache.hadoop.hbase.master.RegionStates: Transition {1588230740 state=PENDING_OPEN, ts=1431265424865, server=node2,60020,1431265418290} to {1588230740 state=OPENING, ts=1431265424918, server=node2,60020,1431265418290}
2015-05-10 09:43:45,126 INFO org.apache.hadoop.hbase.master.RegionStates: Transition {1588230740 state=OPENING, ts=1431265424918, server=node2,60020,1431265418290} to {1588230740 state=OPEN, ts=1431265425126, server=node2,60020,1431265418290}
2015-05-10 09:43:45,128 INFO org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Handling OPENED of 1588230740 from master,60000,1431265418462; deleting unassigned node
2015-05-10 09:43:45,132 INFO org.apache.hadoop.hbase.master.HMaster: hbase:meta assigned=1, rit=false, location=node2,60020,1431265418290
2015-05-10 09:43:45,200 INFO org.apache.hadoop.hbase.MetaMigrationConvertingToPB: hbase:meta doesn't have any entries to update.
2015-05-10 09:43:45,200 INFO org.apache.hadoop.hbase.MetaMigrationConvertingToPB: META already up-to date with PB serialization
2015-05-10 09:43:45,218 INFO org.apache.hadoop.hbase.master.AssignmentManager: Clean cluster startup. Assigning user regions
2015-05-10 09:43:45,223 INFO org.apache.hadoop.hbase.master.AssignmentManager: Joined the cluster in 23ms, failover=false
2015-05-10 09:43:45,233 INFO org.apache.hadoop.hbase.master.TableNamespaceManager: Namespace table not found. Creating...
2015-05-10 09:43:45,256 FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:152)
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:125)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:897)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1469)
        at java.lang.Thread.run(Thread.java:745)
2015-05-10 09:43:45,262 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2015-05-10 09:43:45,262 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:152)
        at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:125)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:897)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
        at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
        at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1469)
        at java.lang.Thread.run(Thread.java:745)
2015-05-10 09:43:45,262 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown.

avatar
Rising Star

Hi,

 

I think there is no need to restart everything for now.

 

We need to focus on your ZK issue.

 

You can stop HBase until it's fixed.

 

Can you share the ZK logs (/var/log/zookeeper/)?

 

How is ZK looking in CM? All green? How many ZK servers?

 

JMS

avatar
Contributor

There are 2 ZK servers: 1 Leader and 2 Followers.

CM reports all good (green) about ZK.

The only hickup reported by CM it was when I restarted ZK, it turned red for a less than 1 minute and the turned green and has stayed up green since then.

Here is the error that reported at that moment:

<< The ZooKeeper service canary failed for an unknown reason. >>

 

Here is some part of the ZK logs:

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2015-05-10 09:20:55,205 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:37531
2015-05-10 09:20:55,207 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3dfb495f0001 with negotiated timeout 30000 for client /127.0.0.1:37531
2015-05-10 09:21:02,370 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:37531 which had sessionid 0x14d3dfb495f0001
2015-05-10 09:33:13,657 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig: Reading configuration from: /var/run/cloudera-scm-agent/process/1848-zookeeper-server/zoo.cfg
2015-05-10 09:33:13,668 INFO org.apache.zookeeper.server.quorum.QuorumPeerConfig: Defaulting to majority quorums
2015-05-10 09:33:13,671 INFO org.apache.zookeeper.server.DatadirCleanupManager: autopurge.snapRetainCount set to 5
2015-05-10 09:33:13,671 INFO org.apache.zookeeper.server.DatadirCleanupManager: autopurge.purgeInterval set to 24
2015-05-10 09:33:13,673 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task started.
2015-05-10 09:33:13,682 INFO org.apache.zookeeper.server.quorum.QuorumPeerMain: Starting quorum peer
2015-05-10 09:33:13,683 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task completed.
2015-05-10 09:33:13,689 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
2015-05-10 09:33:13,696 INFO org.apache.zookeeper.server.quorum.QuorumPeer: tickTime set to 3000
2015-05-10 09:33:13,696 INFO org.apache.zookeeper.server.quorum.QuorumPeer: minSessionTimeout set to 4000
2015-05-10 09:33:13,696 INFO org.apache.zookeeper.server.quorum.QuorumPeer: maxSessionTimeout set to 60000
2015-05-10 09:33:13,696 INFO org.apache.zookeeper.server.quorum.QuorumPeer: initLimit set to 80
2015-05-10 09:33:13,703 INFO org.apache.zookeeper.server.persistence.FileSnap: Reading snapshot /shared/D12/zookeeper/version-2/snapshot.1400002ac1
2015-05-10 09:33:13,852 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: My election bind port: 0.0.0.0/0.0.0.0:3888
2015-05-10 09:33:13,861 INFO org.apache.zookeeper.server.quorum.QuorumPeer: LOOKING
2015-05-10 09:33:13,862 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: New election. My id =  1, proposed zxid=0x1500000012
2015-05-10 09:33:13,865 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 1 (n.leader), 0x1500000012 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x15 (n.peerEPoch), LOOKING (my state)
2015-05-10 09:33:13,866 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (2, 1)
2015-05-10 09:33:13,866 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Have smaller server identifier, so dropping the connection: (3, 1)
2015-05-10 09:33:13,867 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Received connection request /10.15.230.41:41581
2015-05-10 09:33:13,868 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Received connection request /10.15.230.42:41463
2015-05-10 09:33:13,869 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x1500000012 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x15 (n.peerEPoch), LOOKING (my state)
2015-05-10 09:33:13,869 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x1500000012 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x15 (n.peerEPoch), LOOKING (my state)
2015-05-10 09:33:13,870 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 3 (n.leader), 0x1500000012 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x15 (n.peerEPoch), LOOKING (my state)
2015-05-10 09:33:14,070 INFO org.apache.zookeeper.server.quorum.QuorumPeer: FOLLOWING
2015-05-10 09:33:14,074 INFO org.apache.zookeeper.server.quorum.Learner: TCP NoDelay set to: true
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:zookeeper.version=3.4.5-cdh5.4.0--1, built on 04/21/2015 19:11 GMT
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:host.name=DAST-master.am.hedani.net
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.version=1.7.0_55
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.vendor=Oracle Corporation
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.home=/usr/java/jdk1.7.0_55-cloudera/jre
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.class.path=/var/run/cloudera-scm-agent/process/1848-zookeeper-server:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/log4j.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/build/*:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/build/lib/*:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/zookeeper-3.4.5-cdh5.4.0.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/zookeeper.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/jline-2.11.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/netty-3.2.2.Final.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/slf4j-api-1.7.5.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/log4j-1.2.16.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/slf4j-log4j12.jar:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar:/usr/share/cmf/lib/plugins/event-publish-5.4.0-shaded.jar:/usr/share/cmf/lib/plugins/tt-instrumentation-5.4.0.jar:/usr/share/cmf/lib/plugins/cdh5/audit-plugin-cdh5-2.3.0-shaded.jar
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2015-05-10 09:33:14,078 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.io.tmpdir=/tmp

2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:java.compiler=<NA>
2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:os.name=Linux
2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:os.arch=amd64
2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:os.version=2.6.32-431.el6.x86_64
2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.name=zookeeper
2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.home=/var/lib/zookeeper
2015-05-10 09:33:14,079 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.dir=/var/run/cloudera-scm-agent/process/1848-zookeeper-server
2015-05-10 09:33:14,080 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server with tickTime 3000 minSessionTimeout 4000 maxSessionTimeout 60000 datadir /var/log/zookeeper/version-2 snapdir /shared/D12/zookeeper/version-2
2015-05-10 09:33:14,080 INFO org.apache.zookeeper.server.quorum.Learner: FOLLOWING - LEADER ELECTION TOOK - 218
2015-05-10 09:33:14,166 INFO org.apache.zookeeper.server.quorum.Learner: Getting a snapshot from leader
2015-05-10 09:33:14,173 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 0x1500000012 to /shared/D12/zookeeper/version-2/snapshot.1500000012
2015-05-10 09:35:35,390 WARN org.apache.zookeeper.server.quorum.Learner: Got zxid 0x1600000001 expected 0x1
2015-05-10 09:35:35,391 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.1600000001
2015-05-10 09:35:35,448 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53415
2015-05-10 09:35:35,453 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53415
2015-05-10 09:35:35,457 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0000 with negotiated timeout 30000 for client /10.15.230.22:53415
2015-05-10 09:35:35,517 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53415 which had sessionid 0x14d3e07963c0000
2015-05-10 09:35:54,650 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:38065
2015-05-10 09:35:54,652 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:38065
2015-05-10 09:35:54,654 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0001 with negotiated timeout 30000 for client /127.0.0.1:38065
2015-05-10 09:36:22,336 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:38065 which had sessionid 0x14d3e07963c0001
2015-05-10 09:37:05,421 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53514
2015-05-10 09:37:05,422 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53514
2015-05-10 09:37:05,423 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0002 with negotiated timeout 30000 for client /10.15.230.22:53514
2015-05-10 09:37:05,443 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53514 which had sessionid 0x14d3e07963c0002
2015-05-10 09:38:35,342 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53606
2015-05-10 09:38:35,342 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53606
2015-05-10 09:38:35,345 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0003 with negotiated timeout 30000 for client /10.15.230.22:53606
2015-05-10 09:38:35,354 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53609
2015-05-10 09:38:35,354 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53609
2015-05-10 09:38:35,356 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0004 with negotiated timeout 30000 for client /10.15.230.22:53609
2015-05-10 09:38:35,428 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53609 which had sessionid 0x14d3e07963c0004
2015-05-10 09:38:35,432 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53606 which had sessionid 0x14d3e07963c0003
2015-05-10 09:39:35,367 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53706
2015-05-10 09:39:35,368 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53706
2015-05-10 09:39:35,369 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0005 with negotiated timeout 30000 for client /10.15.230.22:53706
2015-05-10 09:39:35,392 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53706 which had sessionid 0x14d3e07963c0005
2015-05-10 09:40:35,387 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53813
2015-05-10 09:40:35,387 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53813
2015-05-10 09:40:35,389 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0006 with negotiated timeout 30000 for client /10.15.230.22:53813
2015-05-10 09:40:35,439 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53813 which had sessionid 0x14d3e07963c0006
2015-05-10 09:41:40,361 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /10.15.230.22:53920
2015-05-10 09:41:40,361 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /10.15.230.22:53920
2015-05-10 09:41:40,362 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x14d3e07963c0007 with negotiated timeout 30000 for client /10.15.230.22:53920
2015-05-10 09:41:40,423 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /10.15.230.22:53920 which had sessionid 0x14d3e07963c0007

avatar
Rising Star

When you type "zookeeper-client" what's the output? Does it ends with something like:

2015-05-10 17:08:53,831 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-10 17:08:53,832 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2015-05-10 17:08:53,842 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14d3c2b942f0793, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

 

Can you try to type "help"?

And "ls /"

And "ls /hbase" ?

 

I don't see any error in your logs. Only INFO.

 

Thanks.

 

 

avatar
Contributor

Now I am able to execute commands!

 

Here they are:

 

ls /
[hive_zookeeper_namespace_hive0, solr, accumulo, hbase, zookeeper, zkdtsm]
ls /hbase
[meta-region-server, backup-masters, draining, table, region-in-transition, running, table-lock, hbaseid, namespace, online-snapshot, replication, splitWAL, recovering-regions, rs, flush-table-proc]

 

Thank you.

 

avatar
Rising Star

So ZK is working fine. Perfect.

 

Make sure HBase is down and clear everything:

hadoop fs -rm -r /hbase/*

echo "rmr /hbase" | zookeeper-client

 

Then try to start HBase.

 

JMS

avatar
Contributor

Hi JM, that worked!!

I am now able to startup HBase with all its servers!

 

Thanks for your assistance and persistance 🙂

 

Cheers,

TS

 

avatar
Rising Star

You're very welcome!

 

Glad it now works for you. Enjoy your cluster!

 

Best,

 

JMS

avatar
New Contributor

I'm new to Cloudera Cluster.  I read somewhere that you should not use "hadoop fs -rm -r /hbase/*" in a production enviroment.  Can you please give me detail what "hadoop fs -rm -r /hbase/*" does?  Is there any data lost or data remove to any of the services or database? 

 

Thanks,

 

-sr

avatar
New Contributor

I have same doubt what happens when i delete "hdfs dfs  -rm -r /hbase/*" will lost data/namespaces in HBASE or HDFS?