Created on 11-17-2018 12:03 AM - edited 08-17-2019 05:45 AM
Phoenix secondary Indexes are useful for point lookups or scans directed against non primary key of Phoenix or non row key columns of HBase. This saves the “full scan” of data table you would otherwise do if you intend to retrieve data based non rowkey.
You create secondary indexes by choosing existing non primary key column from data table and making it as primary or a covered column. By covered column, we mean making exact copy of the covered column’s data from data table to index table.
Functional Index: Built on functions rather than just columns
Global secondary Index: This is the one where we make a exact copy of covered columns and call it index table. In simple terms, its an upsert select on all chosen columns from data table to Index table.
Since a lot of write is involved during initial stages of index creation, this type of index would work best for read heavy use cases where data is written only occasionally and read more frequently.
There are two ways we can create global index :
For specific commands on creating various types of indexes , refer here
Thus, global index (above) assumes the following points :
What if none or some of above assumptions are not fulfilled ? Thats where local index becomes useful as it is part of data table itself in the form of a shadow column family (eliminating assumption 1) , best fit for write heavy use-cases (eliminating assumption 2) and can best be used for partially covered queries as data and index tables co reside. (eliminating assumption 3)
For all practical purposes, I will talk about global index only as that is most common use case and most stable option so far.
To go into details of Index maintenance, we need to also know about another type of global secondary index :
Immutable Global Secondary Index: This is the type of index where index is written once and never updated in-place. Only the client which writes to data table is responsible for writing to index table as well (at the same time ! ). Thus its purely client’s responsibility to maintain sync between data and index table.
Use cases such as time series data or event logs can take advantage of immutable data and index tables. (create data table with IMMUTABLE_ROWS=true option and all index created would default to immutable)
Mutable Global Secondary Index: Here index maintenance is done via server to server RPC (network and handler overhead remember ! ) between data table server and index table server. For simplicity , we can believe that if client was successfully able to write to data table, writes to index table also would have been completed by the data region server. However many issues around this aspect exist, which we will discuss in Part 2.
There are two more varieties of tables called transactional tables and non transactional tables. Transactional tables intend to have atomic writes to data and index table (ACID compliant) and are still work in progress. Thus in next few sections and articles, for all practical purposes, we will talk about non transactional mutable global secondary indexes.
Here are the steps involved in Index maintenance :
Understanding of these steps is very important because in Part 2 of this article series , we will discuss about various issues appearing in index maintenance, index going out of sync, index getting disabled , Queries slowing down, region servers getting unresponsive etc.
References:
Created on 11-19-2018 03:50 PM
Fabulous stuff @Gaurav Sharma !
Created on 11-19-2018 06:27 PM
thank you @Dinesh Chitlangia
Created on 11-28-2018 09:01 PM
Thanks @Gaurav Sharma for the article, very well explained. could you please allow access / or make it public for the image https://docs.google.com/drawings/d/sGd5g0DKnEVh_4PRmJLUNOw/image?w=602&h=399&rev=1&ac=1&parent=1tgeX...
Created on 12-07-2018 09:49 PM
Done. Thanks for reporting.