Created 07-22-2016 07:54 PM
If I have a requirement to index 10 columns individually, would that be an appropriate use of Phoenix secondary indexes. Is there a huge impact with a large number of indexes to maintain, what is the threshold? I understand it's a loaded question but let's pretend cluster size is 16 nodes, 100GB daily ingest rate.
Created 07-22-2016 07:57 PM
(assuming you're referring to mutable global indexes..)
There is a direct relation to the amount of index data you have to create for every update your write to the data table. So, if you are indexing 10 columns for a data table, you're actually writing 11 updates for every one 1 update your clients writes.
Only 100GB of data for a day on 16nodes seems like it would be reasonable to manage this, but you are pushing a lot of work to the RegionServers. I would make sure that the RegionServers are adequately sized to handle all of the extra load.
Using immutable tables pushes this work to the client which might be more scalable a solution: https://phoenix.apache.org/secondary_indexing.html#Immutable_Tables
Created 07-22-2016 07:57 PM
(assuming you're referring to mutable global indexes..)
There is a direct relation to the amount of index data you have to create for every update your write to the data table. So, if you are indexing 10 columns for a data table, you're actually writing 11 updates for every one 1 update your clients writes.
Only 100GB of data for a day on 16nodes seems like it would be reasonable to manage this, but you are pushing a lot of work to the RegionServers. I would make sure that the RegionServers are adequately sized to handle all of the extra load.
Using immutable tables pushes this work to the client which might be more scalable a solution: https://phoenix.apache.org/secondary_indexing.html#Immutable_Tables
Created 07-23-2016 11:59 PM
Josh that link you shared is priceless.