Created 11-29-2017 08:38 PM
We need to execute geo-spatial queries where we use geo-spatial functions that operate on columns from two different tables (YELLOW_TAXI_TRIPS, TAXI_ZONES). The UDF works as a comparison to join the two tables, below is the query in which we want to improve performance applying a functional index.
SELECT ZONES.zone, count(1) AS hits FROM YELLOW_TAXI_TRIPS AS TRIPS, TAXI_ZONES AS ZONES WHERE STWithinFromText('POINT('||TRIPS.PICKUP_LONGITUDE||' '||TRIPS.PICKUP_LATITUDE||')', ZONES.geom) GROUP BY ZONES.zone;
Created 11-30-2017 11:25 AM
Currently , we don't support index involving two tables.
you can index ZONES.geom (include ZONES.zone in your covered index) and index 'POINT('||TRIPS.PICKUP_LONGITUDE||' '||TRIPS.PICKUP_LATITUDE||')' on YELLOW_TAXI_TRIPS separately to improve your query.
Created 11-30-2017 03:06 PM
Thanks for your fast reply! May we give you more details on it?
When we try to create a secondary-index over zones table to improve the use of geom and zone columns on tested query, We got an error
Error: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: TAXI_ZONES,,1511452570485.89b8bbb6f061880983eee4f4665e9a44.: Row length 48250 is > 32767
According to this https://community.hortonworks.com/questions/62213/nifi-putsql-row-length-exception-for-phoenix-upser... We understand that there is a limitation to create an index using this geom column (which is a WKT representation and this value can contain rows with a character length of 271802)
This is the statement we tried to create Index:
CREATE INDEX TAXI_ZONES_INDEXED_GEOM_TEXT ON TAXI_ZONES (geom) INCLUDE(zone);
We are trying to implement a true geo-spatial index, but there is not too much documentation on Phoenix previous works, the only one we found the following: https://github.com/threedliteguy/General/wiki/Adding-spatial-data-queries-to-Phoenix-on-HBase
As long as we understand this approach implements an UDF that is used to pre-calculate geo-spatial information but it is not clear how would indexing work. Do you have any tip or suggestion to work on it?
Thanks!