Support Questions

Find answers, ask questions, and share your expertise

HDP3.1.4 Inconsistent HBase region status results in the region server service downtime

avatar
Explorer

 

 

2023-02-23 09:59:54,031 ERROR [RpcServer.default.FPBQ.Fifo.handler=189,queue=9,port=16000] master.MasterRpcServices: Region server hadoop-08,16020,1676022229147 reported a fatal error:
***** ABORTING region server hadoop-08,16020,1676022229147: org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=hadoop-09,16020,1676022233663, table=hh_app_hbase_poc_tag:label_d_common_data_20221229, region=2d10db72bf694a8af42a38f62ae13c7b reported OPEN on server=hadoop-08,16020,1676022229147 but state has otherwise.
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1036)
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:960)
at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:466)
at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: rit=OPEN, location=hadoop-09,16020,1676022233663, table=hh_app_hbase_poc_tag:label_d_common_data_20221229, region=2d10db72bf694a8af42a38f62ae13c7b reported OPEN on server=hadoop-08,16020,1676022229147 but state has otherwise.

 

These are my hmaster error logs. Have you ever encountered this problem and how to solve it?

This is another article I saw. His error is the same as mine

https://community.cloudera.com/t5/Support-Questions/After-Upgrading-to-HDP-3-0-1-Hbase-balancer-stuc...

 

2 REPLIES 2

avatar
Contributor

Hello,


org.apache.hadoop.hbase.YouAreDeadException, normally occurs if Region Server lost communication or last too much reporting availability with the znode cause different reasons [1]

 

You may want to check what is in Region Server logs and check if the zookeeper service is not crashing and if ZK timeouts are properly set [2]

 

Hope this helps

 

[1] https://issues.apache.org/jira/browse/HBASE-25274 

[2] https://community.cloudera.com/t5/Customer/What-is-the-formula-to-calculate-ZooKeeper-timeouts-for/t... 

avatar
Explorer

Hello,

There is no ZK-related exception information in the log of my region server. Only the stopping server appears after the closed region

The following is the log of my region server:

2023-02-23 09:59:54,281 INFO [RS_CLOSE_REGION-regionserver/hadoop-08:16020-1] regionserver.HRegion: Closed hh_app_hbase_poc_tag:label_d_common_data_20221229,018_1818111,1674980891241.2d10db72bf694a8af42a38f62ae13c7b.
2023-02-23 09:59:54,282 INFO [RS_CLOSE_REGION-regionserver/hadoop-08:16020-0] hbase.RangerAuthorizationCoprocessor: Unable to get remote Address
2023-02-23 09:59:54,457 INFO [regionserver/hadoop-08:16020] regionserver.HRegionServer: stopping server hadoop-08,16020,1676022229147; all regions closed.
2023-02-23 09:59:54,478 WARN [Close-WAL-Writer-304] asyncfs.FanOutOneBlockAsyncDFSOutputHelper: lease for file /apps/hbase/data/WALs/hadoop-08,16020,1676022229147/hadoop-08%2C16020%2C1676022229147.1677116687197 is expired, give up

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Client (=DFSClient_NONMAPREDUCE_1647216046_1) is not the lease owner (=DFSClient_NONMAPREDUCE_1918766927_1: /apps/hbase/data/WALs/hadoop-08,16020,1676022229147-splitting/hadoop-08%2C16020%2C1676022229147.1677116687197 (inode 175124591) [Lease. Holder: DFSClient_NONMAPREDUCE_1647216046_1, pending creates: 1].