Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDP3.1.4 Inconsistent HBase region status results in the region server service downtime

avatar
Explorer

 

 

2023-02-23 09:59:54,031 ERROR [RpcServer.default.FPBQ.Fifo.handler=189,queue=9,port=16000] master.MasterRpcServices: Region server hadoop-08,16020,1676022229147 reported a fatal error:
***** ABORTING region server hadoop-08,16020,1676022229147: org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=hadoop-09,16020,1676022233663, table=hh_app_hbase_poc_tag:label_d_common_data_20221229, region=2d10db72bf694a8af42a38f62ae13c7b reported OPEN on server=hadoop-08,16020,1676022229147 but state has otherwise.
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1036)
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:960)
at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:466)
at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: rit=OPEN, location=hadoop-09,16020,1676022233663, table=hh_app_hbase_poc_tag:label_d_common_data_20221229, region=2d10db72bf694a8af42a38f62ae13c7b reported OPEN on server=hadoop-08,16020,1676022229147 but state has otherwise.

 

These are my hmaster error logs. Have you ever encountered this problem and how to solve it?

This is another article I saw. His error is the same as mine

https://community.cloudera.com/t5/Support-Questions/After-Upgrading-to-HDP-3-0-1-Hbase-balancer-stuc...

 

2 REPLIES 2

avatar
Contributor

Hello,


org.apache.hadoop.hbase.YouAreDeadException, normally occurs if Region Server lost communication or last too much reporting availability with the znode cause different reasons [1]

 

You may want to check what is in Region Server logs and check if the zookeeper service is not crashing and if ZK timeouts are properly set [2]

 

Hope this helps

 

[1] https://issues.apache.org/jira/browse/HBASE-25274 

[2] https://community.cloudera.com/t5/Customer/What-is-the-formula-to-calculate-ZooKeeper-timeouts-for/t... 

avatar
Explorer

Hello,

There is no ZK-related exception information in the log of my region server. Only the stopping server appears after the closed region

The following is the log of my region server:

2023-02-23 09:59:54,281 INFO [RS_CLOSE_REGION-regionserver/hadoop-08:16020-1] regionserver.HRegion: Closed hh_app_hbase_poc_tag:label_d_common_data_20221229,018_1818111,1674980891241.2d10db72bf694a8af42a38f62ae13c7b.
2023-02-23 09:59:54,282 INFO [RS_CLOSE_REGION-regionserver/hadoop-08:16020-0] hbase.RangerAuthorizationCoprocessor: Unable to get remote Address
2023-02-23 09:59:54,457 INFO [regionserver/hadoop-08:16020] regionserver.HRegionServer: stopping server hadoop-08,16020,1676022229147; all regions closed.
2023-02-23 09:59:54,478 WARN [Close-WAL-Writer-304] asyncfs.FanOutOneBlockAsyncDFSOutputHelper: lease for file /apps/hbase/data/WALs/hadoop-08,16020,1676022229147/hadoop-08%2C16020%2C1676022229147.1677116687197 is expired, give up

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): Client (=DFSClient_NONMAPREDUCE_1647216046_1) is not the lease owner (=DFSClient_NONMAPREDUCE_1918766927_1: /apps/hbase/data/WALs/hadoop-08,16020,1676022229147-splitting/hadoop-08%2C16020%2C1676022229147.1677116687197 (inode 175124591) [Lease. Holder: DFSClient_NONMAPREDUCE_1647216046_1, pending creates: 1].