Support Questions

mahipal_ramidi · ‎08-07-2016

We perform frequently Cartesian products involving geospatial functions in the where clause (e.g. ST_Intersects) of our Hive queries. What are the best approaches for tuning those queries for response time and concurrency?

bleonhardi · ‎08-08-2016

Gopal and me gave a couple of tips in here to increase the parallelity ( since Hive is normally not tuned for cartesian joins and creates too few mappers ).

https://community.hortonworks.com/questions/44749/hive-query-running-on-tez-contains-a-mapper-that-h...

Apart from that my second point still holds you should create some pre-filtering to reduce the amount of points you need to compare. There are a ton of different ways to do this:

https://en.wikipedia.org/wiki/Spatial_database#Spatial_index

You can put points in grids and make sure that a data point in one grid entry cannot be closer to any point of the other grid entry than your max distance for example.

View solution in original post

bleonhardi · ‎08-08-2016