Support Questions
Find answers, ask questions, and share your expertise

Is it possible in Phoenix to create a functional index on an UDF that uses as arguments columns from two different tables?

New Contributor

We need to execute geo-spatial queries where we use geo-spatial functions that operate on columns from two different tables (YELLOW_TAXI_TRIPS, TAXI_ZONES). The UDF works as a comparison to join the two tables, below is the query in which we want to improve performance applying a functional index.

SELECT ZONES.zone, count(1) AS hits
FROM YELLOW_TAXI_TRIPS AS TRIPS, TAXI_ZONES AS ZONES
WHERE STWithinFromText('POINT('||TRIPS.PICKUP_LONGITUDE||' '||TRIPS.PICKUP_LATITUDE||')', ZONES.geom)
GROUP BY ZONES.zone;
2 REPLIES 2

Currently , we don't support index involving two tables.

you can index ZONES.geom (include ZONES.zone in your covered index) and index 'POINT('||TRIPS.PICKUP_LONGITUDE||' '||TRIPS.PICKUP_LATITUDE||')' on YELLOW_TAXI_TRIPS separately to improve your query.

New Contributor

Thanks for your fast reply! May we give you more details on it?

When we try to create a secondary-index over zones table to improve the use of geom and zone columns on tested query, We got an error

Error: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: TAXI_ZONES,,1511452570485.89b8bbb6f061880983eee4f4665e9a44.: Row length 48250 is > 32767

According to this https://community.hortonworks.com/questions/62213/nifi-putsql-row-length-exception-for-phoenix-upser... We understand that there is a limitation to create an index using this geom column (which is a WKT representation and this value can contain rows with a character length of 271802)

This is the statement we tried to create Index:

CREATE INDEX TAXI_ZONES_INDEXED_GEOM_TEXT ON TAXI_ZONES (geom) INCLUDE(zone);

We are trying to implement a true geo-spatial index, but there is not too much documentation on Phoenix previous works, the only one we found the following: https://github.com/threedliteguy/General/wiki/Adding-spatial-data-queries-to-Phoenix-on-HBase

As long as we understand this approach implements an UDF that is used to pre-calculate geo-spatial information but it is not clear how would indexing work. Do you have any tip or suggestion to work on it?

Thanks!

; ;