In Part 1 of this article series, we discussed the internals of Index maintenance, in this part we will cover some of the major issues we face during the life cycle of Index maintenance.
Before we get into issues, we need to understand various “states” of Index table which reflect its health in general.
BUILDING("b") : This will partially rebuild the index from the last disabled timestamp
UNUSABLE (“d”) / INACTIVE ("i") : This will cause the index to no longer be considered for use in queries, however index maintenance will continue to be performed.
ACTIVE("a") : Index ready to use and updated.
DISABLE("x") : This will cause the no further index maintenance to be performed on the index and it will no longer be considered for use in queries.
REBUILD("r") : This will completely rebuild the index and upon completion will enable the index to be used in queries again.
What happens when an Index update fails for any reason :
The answer is not straight as there are choices of implementations here based on use case or table types. Following are two choices we have:
Choice 1: Block writes to data table but let Index continue to serve read requests. Maintain a point of “consistency” in the form of a timestamp just before failure occurred. Keep the write block until Index table is rebuilt in the background and gets in sync with data table again. Properties involved are:
This option is not yet available in HDP 2 but available with HDP 3.0
Choice 2: Writes to the data table are not stopped but the index table in question is disabled to be detected by rebuilder threads (pushed from system.catalog hosting server ), converted as “inactive” and partially rebuilt again. In this mode, Index table will not serve any requests to clients.This is the implementation we are using with HDP 2 .
Properties involved are:
phoenix.index.failure.handling.rebuild.interval=10000 (or 10 seconds, interval in which server checks if any index table needs partial rebuild )
phoenix.index.failure.handling.rebuild.overlap.time=1 (time to go back before index_disable_timestamp to be able to rebuild from that point)
Few Scenarios for Troubleshooting issues:
There are various scenarios which could help us gain more insight into how Index maintenance, update and failure handling is done in Phoenix. (and we will only talk about choice 2 above)
Scenario 1: Index update is written to WAL and before being written to data or Index table region server hosting data table crashes.
WAL is replayed and Index updates are committed via server-to-server RPC
Scenario 2 : Data table is written however server-to-server RPC to Index table fails
This is where state of Index table will change to disabled. A rebuilder thread in server hosting system.catalog table keeps checking on these index states, as soon as it detects a “disabled” index table, it starts rebuild process by first marking this table as “Inactive” , then running rebuild scan on data table regions and finally making index updates via server to server RPCs. Client Queries during this time only refer data table.
Here it's good to know about “INDEX_DISABLE_TIMESTAMP” , It is the timestamp at which index got disabled. It will be 0 , if the index is active or disabled by client manually and will be non-zero if index is disabled during write failures. Thus rebuild will only happen after disabled timestamp updates.
One can use following query to check the value of this column:
select TABLE_NAME, cast(INDEX_DISABLE_TIMESTAMP as timestamp) from SYSTEM.CATALOG where index_state is not null limit 10;
+ | TABLE_NAME | TO_TIMESTAMP(INDEX_DISABLE_TIMESTAMP) |
+ | TEST_INDEX_PERF | 2018-05-26 10:28:54.079 |
| TEST1_INDEX_PERF | 2018-05-26 10:28:54.079
+ 2 rows selected (0.089 seconds)
Once rebuild completes in background, Index table’s state changes back to “active”. All this while data table keeps serving read and write requests.
Scenario 3 : Index went into disabled state, HBase became unresponsive, handlers are saturated (verified from Grafana), Queries are dead slow and nothing is moving.
Let's break this down into a sequence of most probable events:
Multiple Client writing to region server 1 (data table) using all of the default handlers.
Now there are no handlers left on region server 1 to write the index update to region server 2 which hosts Index table regions.
Since index update is not written on RS2, client RPC on RS1 does not free up (and if situation continues, times out after hbase.rpc.timeout )
Because Index update failed, Index table goes into disabled state.
Rebuilder threads detect disabled state of Index and start rebuilding this table subsequently contesting for the same default handler pool aggravating this situation further.
This is a very common “deadlock” scenario and users struggle to find what caused all these issues and where to start fixing them. In computer science, this situation is also known as “dining philosophers problem”.
Above sequence of events could cause some or all of the possible issues:
queries getting hung or timed out
region servers getting unresponsive
clients unable to login to phoenix shell
long GC pauses (due to large number of objects creation )
Point “4” above would eventually break the session with zookeeper and may bring the region server down.
What is the solution to this problem ?
Since we had a common pool of default handlers for client and servers both which caused these issues, it was decided to create a dedicated Index handler pool and a custom RPC scheduler for the same. Also add custom RPC controller to the chain of controllers. This would filter outgoing index RPCs and tag them for higher priority.
Following parameters were expected be added for the same (already part of HDP 2.6):
However, there was another issue introduced (PHOENIX-3360, PHOENIX-3994) due to these added parameters. Since clients also shared the same hbase-site.xml with these additional parameters , they started sending normal requests tagged with index priority. Similarly Index rebuild scans also sent their RPCs tagged with index priority and using Index handler pool which is not what it was designed for and led many users to another “deadlock” situation where index writes would fail because most index handlers are busy doing rebuild scans or being used by clients.
The fix to PHOENIX-3994 (part of HDP 2.6.5) would remove dependencies on these parameters for index priority and hence these parameters would neither be needed at server side nor at client side. However Ambari still adds these parameters and could still create issues. A quick heck would be to remove these two properties from all the client side hbase-site.xml files.
For clients such as NIFI which source hbase-site.xml from phoenix-client jars, it would be good to zip the updated hbase-site.xml in the jar itself.
If you have many or large Index tables which require substantial number of RPCs, you can also define “phoenix.rpc.index.handler.count” in custom hbase-site.xml and give it an appropriate value proportional to the total handler counts you have defined.
We will discuss couple more scenarios in Part 3 of this article series.