Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is it possible in Phoenix to create a functional index on an UDF that uses as arguments columns from two different tables?

Is it possible in Phoenix to create a functional index on an UDF that uses as arguments columns from two different tables?

New Contributor

We need to execute geo-spatial queries where we use geo-spatial functions that operate on columns from two different tables (YELLOW_TAXI_TRIPS, TAXI_ZONES). The UDF works as a comparison to join the two tables, below is the query in which we want to improve performance applying a functional index.

SELECT ZONES.zone, count(1) AS hits
FROM YELLOW_TAXI_TRIPS AS TRIPS, TAXI_ZONES AS ZONES
WHERE STWithinFromText('POINT('||TRIPS.PICKUP_LONGITUDE||' '||TRIPS.PICKUP_LATITUDE||')', ZONES.geom)
GROUP BY ZONES.zone;
2 REPLIES 2
Highlighted

Re: Is it possible in Phoenix to create a functional index on an UDF that uses as arguments columns from two different tables?

Currently , we don't support index involving two tables.

you can index ZONES.geom (include ZONES.zone in your covered index) and index 'POINT('||TRIPS.PICKUP_LONGITUDE||' '||TRIPS.PICKUP_LATITUDE||')' on YELLOW_TAXI_TRIPS separately to improve your query.

Re: Is it possible in Phoenix to create a functional index on an UDF that uses as arguments columns from two different tables?

New Contributor

Thanks for your fast reply! May we give you more details on it?

When we try to create a secondary-index over zones table to improve the use of geom and zone columns on tested query, We got an error

Error: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: TAXI_ZONES,,1511452570485.89b8bbb6f061880983eee4f4665e9a44.: Row length 48250 is > 32767

According to this https://community.hortonworks.com/questions/62213/nifi-putsql-row-length-exception-for-phoenix-upser... We understand that there is a limitation to create an index using this geom column (which is a WKT representation and this value can contain rows with a character length of 271802)

This is the statement we tried to create Index:

CREATE INDEX TAXI_ZONES_INDEXED_GEOM_TEXT ON TAXI_ZONES (geom) INCLUDE(zone);

We are trying to implement a true geo-spatial index, but there is not too much documentation on Phoenix previous works, the only one we found the following: https://github.com/threedliteguy/General/wiki/Adding-spatial-data-queries-to-Phoenix-on-HBase

As long as we understand this approach implements an UDF that is used to pre-calculate geo-spatial information but it is not clear how would indexing work. Do you have any tip or suggestion to work on it?

Thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here