<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: ​We perform frequently Cartesian products involving geospatial functions in the where clause (e.g. ST_Intersects) of our Hive queries. What are the best approaches for tuning those queries for response time and concurrency? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-perform-frequently-Cartesian-products-involving/m-p/167899#M37138</link>
    <description>&lt;P&gt;Gopal and me gave a couple of tips in here to increase the parallelity ( since Hive is normally not tuned for cartesian joins and creates too few mappers ).&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/44749/hive-query-running-on-tez-contains-a-mapper-that-h.html#comment-45388" target="_blank"&gt;https://community.hortonworks.com/questions/44749/hive-query-running-on-tez-contains-a-mapper-that-h.html#comment-45388&lt;/A&gt;&lt;/P&gt;&lt;P&gt; Apart from that my second point still holds you should create some pre-filtering to reduce the amount of points you need to compare. There are a ton of different ways to do this:&lt;/P&gt;&lt;P&gt;&lt;A href="https://en.wikipedia.org/wiki/Spatial_database#Spatial_index" target="_blank"&gt;https://en.wikipedia.org/wiki/Spatial_database#Spatial_index&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can put points in grids and make sure that a data point in one grid entry cannot be closer to any point of the other grid entry than your max distance for example. &lt;/P&gt;</description>
    <pubDate>Mon, 08 Aug 2016 17:00:33 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-08-08T17:00:33Z</dc:date>
    <item>
      <title>​We perform frequently Cartesian products involving geospatial functions in the where clause (e.g. ST_Intersects) of our Hive queries. What are the best approaches for tuning those queries for response time and concurrency?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-perform-frequently-Cartesian-products-involving/m-p/167898#M37137</link>
      <description>&lt;P&gt;We
perform frequently Cartesian products involving geospatial functions in the
where clause (e.g. ST_Intersects) of our Hive queries. What are the best
approaches for tuning those queries for response time and concurrency?&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 02:57:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-perform-frequently-Cartesian-products-involving/m-p/167898#M37137</guid>
      <dc:creator>mahipal_ramidi</dc:creator>
      <dc:date>2016-08-08T02:57:39Z</dc:date>
    </item>
    <item>
      <title>Re: ​We perform frequently Cartesian products involving geospatial functions in the where clause (e.g. ST_Intersects) of our Hive queries. What are the best approaches for tuning those queries for response time and concurrency?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-perform-frequently-Cartesian-products-involving/m-p/167899#M37138</link>
      <description>&lt;P&gt;Gopal and me gave a couple of tips in here to increase the parallelity ( since Hive is normally not tuned for cartesian joins and creates too few mappers ).&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/44749/hive-query-running-on-tez-contains-a-mapper-that-h.html#comment-45388" target="_blank"&gt;https://community.hortonworks.com/questions/44749/hive-query-running-on-tez-contains-a-mapper-that-h.html#comment-45388&lt;/A&gt;&lt;/P&gt;&lt;P&gt; Apart from that my second point still holds you should create some pre-filtering to reduce the amount of points you need to compare. There are a ton of different ways to do this:&lt;/P&gt;&lt;P&gt;&lt;A href="https://en.wikipedia.org/wiki/Spatial_database#Spatial_index" target="_blank"&gt;https://en.wikipedia.org/wiki/Spatial_database#Spatial_index&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You can put points in grids and make sure that a data point in one grid entry cannot be closer to any point of the other grid entry than your max distance for example. &lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 17:00:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-perform-frequently-Cartesian-products-involving/m-p/167899#M37138</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-08-08T17:00:33Z</dc:date>
    </item>
  </channel>
</rss>

