Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Can phoenix local indexes create a deadlock during an HBase restart?


Hi Guys,

I have been testing out the Phoenix Local Indexes and I'm facing an issue after restart the entire HBase cluster.

Scenario: I'm using Ambari 2.1.2 and HDP 2.3 using Phoenix 4.4 and HBase 1.1.1. My test cluster contains 10 machines and the main table contains 300 pre-split regions which implies 300 regions on local index table as well. To configure Phoenix I'm following this tutorial.

When I start a fresh cluster everything is just fine, the local index is created and I can insert data and query it using the index. The problem comes when I need to restart the cluster to update some configurations in that moment I'm not able to restart the cluster anymore. Most of the servers have exceptions like this one which looks that they are getting into a state where some region servers are waiting from regions that are not available yet in other region servers. (Kinda of a deadlock)

INFO  [htable-pool7-t1] client.AsyncProcess: #5, table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e. is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952
	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
	at org.apache.hadoop.hbase.ipc.RpcExecutor$
 on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started null, retrying after=20001ms, replay=1ops
INFO  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1] client.AsyncProcess: #3, waiting for 2  actions to finish
INFO  [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2] client.AsyncProcess: #4, waiting for 2  actions to finish

When the server is having these exceptions I can see this message (I checked the size of this file and it is very small):

Description: Replaying edits from hdfs://.../recovered.edits/0000000000000464197
Status: Running pre-WAL-restore hook in coprocessors (since 48mins, 45sec ago)

Another interesting thing that I noticed is the empty coprocessor list for the servers that are stuck.

For other hand, HBase master goes down after logging some of these messages:

GeneralBulkAssigner: Failed bulking assigning N regions

Any help would be awesome 🙂

Thank you




Hi @Pedro Gandola

This problem occurs when meta regions are not assigned yet and preScannerOpen coprocessor waits for reading meta table for local indexes, which results in openregionthreads to wait forever because of deadlock.

you can solve this by increasing number of threads required to open the regions so that meta regions can be assigned even threads for local index table is still waiting to remove the deadlock.

<property> <name>hbase.regionserver.executor.openregion.threads</name> <value>100</value> </property>

View solution in original post



@Artem Ervits, Not now.. as we don't recommend to use local index in production yet. Local Index will probably be ready for production in next HDP release(but not sure) and this connection made (which access meta/namespace tables) during preScannerOpen will be moved to different place to avoid above problem.,