Created on 07-12-2017 10:13 AM - edited 09-16-2022 04:55 AM
Hi.
We have been facing this issue many times till now. All region servers go down and then Hbase master when try to do log splitting it throws out error and entire hbase conenctivity is lost until we restart the cluster.Did anyone face this kind of issue ,it would be great if we know the root cause of why servers are going down and why Hbase master is not able to make the region servers active through log splitting?
2017-07-11 00:03:32,380 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://nameservice1/hbase/WALs/hostnamedn01.com,60020,1438807980302-splitting] installed = 1 but only 0 done
2017-07-11 00:03:32,381 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for hostnamedn01.com,60020,1438807980302, will retry
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:346)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:219)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://nameservice1/hbase/WALs/hostnamedn01.com,60020,1438807980302-splitting] Task = installed = 1 done = 0 error = 1
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:291)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:212)
... 4 more
2017-07-11 00:03:32,385 INFO org.apache.hadoop.hbase.coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fhostnamedn05.com%2C60020%2C1438807980454-splitting%2Fhostnamedn05.com%252C60020%252C1438807980454.1438865586074 acquired by hostnamedn05.com,60020,1498483521347
2017-07-11 00:03:32,477 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for hostnamedn01.com,60020,1438807980302 before assignment; region count=0
2017-07-11 00:03:32,481 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [hostnamedn01.com,60020,1438807980302]
2017-07-11 00:03:32,482 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting 1 logs in [hdfs://nameservice1/hbase/WALs/hostnamedn01.com,60020,1438807980302-splitting] for [hostnamedn01.com,60020,1438807980302]
2017-07-11 00:03:37,505 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 2 tasks={/hbase/splitWAL/WALs%2Fhostnamedn03.com%2C60020%2C1438807980381-splitting%2Fhostnamedn03.com%252C60020%252C1438807980381.1438865589887=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0, /hbase/splitWAL/WALs%2Fhostnamedn02.com%2C60020%2C1438807980385-splitting%2Fhostnamedn02.com%252C60020%252C1438807980385.1438865587122=last_update = 1499749403398 last_version = 30 cur_worker_name = hostnamedn05.com,60020,1498483521347 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0, /hbase/splitWAL/WALs%2Fhostnamedn05.com%2C60020%2C1438807980454-splitting%2Fhostnamedn05.com%252C60020%252C1438807980454.1438865586074=last_update = 1499749412430 last_version = 2 cur_worker_name = hostnamedn05.com,60020,1498483521347 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0, /hbase/splitWAL/WALs%2Fhostnamedn01.com%2C60020%2C1438807980302-splitting%2Fhostnamedn01.com%252C60020%252C1438807980302.1438865586081=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2017-07-11 00:03:43,505 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 2 tasks={/hbase/splitWAL/WALs%2Fhostnamedn03.com%2C60020%2C1438807980381-splitting%2Fhostnamedn03.com%252C60020%252C1438807980381.1438865589887=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0, /hbase/splitWAL/WALs%2Fhostnamedn02.com%2C60020%2C1438807980385-splitting%2Fhostnamedn02.com%252C60020%252C1438807980385.1438865587122=last_update = 1499749403398 last_version = 30 cur_worker_name = hostnamedn05.com,60020,1498483521347 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0, /hbase/splitWAL/WALs%2Fhostnamedn05.com%2C60020%2C1438807980454-splitting%2Fhostnamedn05.com%252C60020%252C1438807980454.1438865586074=last_update = 1499749412430 last_version = 2 cur_worker_name = hostnamedn05.com,60020,1498483521347 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0, /hbase/splitWAL/WALs%2Fhostnamedn01.com%2C60020%2C1438807980302-splitting%2Fhostnamedn01.com%252C60020%252C1438807980302.1438865586081=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2017-07-11 00:03:49,411 INFO org.apache.hadoop.hbase.coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fhostnamedn02.com%2C60020%2C1438807980385-splitting%2Fhostnamedn02.com%252C60020%252C1438807980385.1438865587122 entered state: ERR hostnamedn05.com,60020,1498483521347
2017-07-11 00:03:49,412 WARN org.apache.hadoop.hbase.coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fhostnamedn02.com%2C60020%2C1438807980385-splitting%2Fhostnamedn02.com%252C60020%252C1438807980385.1438865587122
2017-07-11 00:03:49,412 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://nameservice1/hbase/WALs/hostnamedn02.com,60020,1438807980385-splitting] installed = 1 but only 0 done
2017-07-11 00:03:49,413 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for hostnamedn02.com,60020,1438807980385, will retry
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:346)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:219)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://nameservice1/hbase/WALs/hostnamedn02.com,60020,1438807980385-splitting] Task = installed = 1 done = 0 error = 1
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:291)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:212)
Created 07-22-2017 10:24 AM
Created 05-21-2019 03:36 AM
I have similar problem can you please help
logs of HMaster-
2019-05-21 15:16:39,745 WARN [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2F10.136.107.153%2C16020%2C1558423482915-splitting%2F10.136.107.153%252C16020%252C1558423482915.default.1558426329243
2019-05-21 15:16:39,745 WARN [MASTER_SERVER_OPERATIONS-DIGPUNPERHEL02:16000-0] master.SplitLogManager: error while splitting logs in [hdfs://10.136.107.59:9000/hbase/WALs/10.136.107.153,16020,1558423482915-splitting] installed = 3 but only 0 done
2019-05-21 15:16:39,747 ERROR [MASTER_SERVER_OPERATIONS-DIGPUNPERHEL02:16000-0] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for 10.136.107.153,16020,1558423482915, will retry
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://10.136.107.59:9000/hbase/WALs/10.136.107.153,16020,1558423482915-splitting] Task = installed = 3 done = 0 error = 3
at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:391)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:364)
at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:286)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:213)
... 4 more
2019-05-21 15:16:39,748 FATAL [MASTER_SERVER_OPERATIONS-DIGPUNPERHEL02:16000-0] master.HMaster: Master server abort: loaded coprocessors are: []
2019-05-21 15:16:39,748 FATAL [MASTER_SERVER_OPERATIONS-DIGPUNPERHEL02:16000-0] master.HMaster: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for 10.136.107.153,16020,1558423482915, will retry
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:357)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://10.136.107.59:9000/hbase/WALs/10.136.107.153,16020,1558423482915-splitting] Task = installed = 3 done = 0 error = 3
Logs of HRegionServer -
2019-05-21 15:16:38,669 INFO [RS_LOG_REPLAY_OPS-DIGPUNPERHEL02:16020-0-Writer-1] wal.WALSplitter: Creating writer path=hdfs://10.136.107.59:9000/hbase/data/dev2/observation/8baa93c0a9ddc9ab4ebfead1a50d85b2/recovered.edits/0000000000002354365.temp region=8baa93c0a9ddc9ab4ebfead1a50d85b2
2019-05-21 15:16:38,701 WARN [Thread-101] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/data/dev2/observation/8baa93c0a9ddc9ab4ebfead1a50d85b2/recovered.edits/0000000000002354365.temp could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)