Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to start master and join region server - procedure.ServerCrashProcedure: Waiting on master failover to complete

Highlighted

Unable to start master and join region server - procedure.ServerCrashProcedure: Waiting on master failover to complete

New Contributor

Problem statement

 

Currently I am using OSS version of Apache Hbase 1.3.6 and trying to establish master and region cluster independently but it fails drastically and I am getting multiple regions server spawned from single node itself and which comes up and goes as a dead regions 

 

Issue Summary

 

I am currently using Apache Hbase version - 1.3.6 and I am trying to run Master and region server separately and then join the cluster dynamically but it was region server was not starting and hangs at "The RegionServer is initializing!"

Commands used as below: (Master and region are on separate nodes )

Node A - Hbase Master - /opt/hbase/bin/hbase-daemon.sh --config /usr/local/bin/hbase/conf start master

Node B - Hbase Region - /opt/hbase/bin/hbase-daemon.sh --config /usr/local/bin/hbase/conf start regionserver

 

Environment - Google Compute Engine (GCE) Instance groups/VM's

OS Type - CentOS -7

Master running ports - 16000.tcp 16010/web 

Region server running ports - 16020/tcp 16030/web

Also not sure on how to enable reverse DNS across both the machines and whether that is the problem and please do advice on how do i achieve it

Master logs:

From the below master logs it clearly says that master is trying to connect to region and then eventually getting disconnected from the client region server 

complete logs

2020-04-22 19:38:24,812 DEBUG [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: connection from 10.148.6.13:45732; # active connections: 1
2020-04-22 19:38:24,961 DEBUG [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=16000] ipc.RpcServer: RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=16000: callId: 0 service: RegionServerStatusService methodName: RegionServerStartup size: 47 connection: 10.148.6.13:45732
2020-04-22 19:38:30,591 DEBUG [*pinpoint-master-v000-rh5k:16000*.activeMasterManager] ipc.RpcClientImpl: Connecting to pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020
2020-04-22 19:38:31,268 DEBUG [hconnection-0x5f02b9cb-shared--pool3-t1] ipc.RpcClientImpl: Connecting to pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020
2020-04-22 19:38:31,478 DEBUG [ProcedureExecutor-3] ipc.RpcClientImpl: Connecting to pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020
2020-04-22 19:39:32,714 DEBUG [RpcServer.reader=1,bindAddress=pinpoint-master-v000-rh5k.c.gcp-ushi-telemetry-npe.internal,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: DISCONNECTING client 10.148.6.13:45732 because read count=-1. Number of active connections: 1

 

Region server logs:

From the below logs region server discovers the master on it's own but unable to join the cluster with below logs

===============================================================

 

587584303253 with port=16020, startcode=1587583634667
2020-04-22 19:38:24,801 DEBUG [regionserver/pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020] ipc.RpcClientImpl: Connecting to pinpoint-master-v000-rh5k.c.gcp-ushi-telemetry-npe.internal/10.148.6.154:16000
2020-04-22 19:38:28,005 INFO [regionserver/pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020] regionserver.HRegionServer: reportForDuty to master=pinpoint-master-v000-rh5k.c.gcp-ushi-telemetry-npe.internal,16000,1587584303253 with port=16020, startcode=1587583634667
2020-04-22 19:38:28,033 INFO [regionserver/pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020] regionserver.HRegionServer: Config from master: hbase.rootdir=hdfs://10.148.6.68:9000/hbase
2020-04-22 19:38:28,033 INFO [regionserver/pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020] regionserver.HRegionServer: Config from master: fs.defaultFS=hdfs://10.148.6.68:9000
2020-04-22 19:38:28,033 INFO [regionserver/pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020] regionserver.HRegionServer: Config from master: hbase.master.info.port=16010

===============================================================

 

2020-04-22 19:38:24,801 DEBUG [regionserver/pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal/10.148.6.13:16020] ipc.RpcClientImpl: Connecting to pinpoint-master-v000-rh5k.c.gcp-ushi-telemetry-npe.internal/10.148.6.154:16000
2020-04-22 19:38:30,592 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: connection from 10.148.6.154:53050; # active connections: 1
2020-04-22 19:38:31,269 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: connection from 10.148.6.154:53052; # active connections: 2
2020-04-22 19:38:31,479 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: connection from 10.148.6.154:53056; # active connections: 3
2020-04-22 19:39:32,413 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 3 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,440 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 4 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,443 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 5 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,445 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 6 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,447 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 7 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,450 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 8 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,452 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 9 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,454 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 10 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,456 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 11 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050
2020-04-22 19:39:32,458 DEBUG [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020] ipc.RpcServer: RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16020: callId: 12 service: AdminService methodName: OpenRegion size: 81 connection: 10.148.6.154:53050

===============================================================

2020-04-23 04:40:07,751 DEBUG [RpcServer.reader=3,bindAddress=pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: DISCONNECTING client 10.148.6.13:44272 because read count=-1. Number of active connections: 1
2020-04-23 04:40:17,751 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: connection from 10.148.6.13:44280; # active connections: 1
2020-04-23 04:40:17,752 DEBUG [RpcServer.reader=4,bindAddress=pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: DISCONNECTING client 10.148.6.13:44280 because read count=-1. Number of active connections: 1
2020-04-23 04:40:27,752 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: connection from 10.148.6.13:44282; # active connections: 1
2020-04-23 04:40:27,752 DEBUG [RpcServer.reader=5,bindAddress=pinpoint-r-v000-976s.c.gcp-ushi-telemetry-npe.internal,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: DISCONNECTING client 10.148.6.13:44282 because read count=-1. Number of active connections: 1
2020-04-23 04:40:37,752 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: connection from 10.148.6.13:44284; # active connections: 1

 

1 REPLY 1
Highlighted

Re: Unable to start master and join region server - procedure.ServerCrashProcedure: Waiting on master failover to complete

New Contributor
 

As per the below Hbase master console Web UI snapshot I am observing an weird behavior where same region server were getting spawned multiple times and goes into "Dead region server" state

 

Screenshot 2020-04-23 at 10.13.03 PM.png

 

And in the region server console i see that system gets stalled at "The RegionServer is initializing!"

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here