<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Range queries in Impala in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/24258#M4638</link>
    <description>&lt;P&gt;Hi MickeyMouse,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;my understanding is that you are looking to answer k-neirest neighbour (kNN) queries, i.e., given a query lat/long, find the k nearest lat/long in the dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You'd be able to answer such queries in Impala by tranforming the kNN query into a swries of range queries (keep increasing the range until you've found at least k answers).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;One way to make the range queries effficient could be to partition your table&amp;nbsp;based on lat and/or long. Of course, there are many distinct lat/long values, so you'd probably need to create buckets in the space of lat/long values (e.g. a grid structure). Then you'd need to transform the original lat/long values given in the query into the grid space. This way Impala's partition pruning will ick in and you'd be restricted to searcing data in those grid cells that overlap with the specified range. Just a high-level idea, hope it makes sense.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm afraid there is no easy and efficient wayto directly answer kNN queries in Impala today.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alex&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 31 Jan 2015 00:51:58 GMT</pubDate>
    <dc:creator>alex.behm</dc:creator>
    <dc:date>2015-01-31T00:51:58Z</dc:date>
    <item>
      <title>Range queries in Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/23759#M4636</link>
      <description>&lt;P&gt;Greetings. Is there an efficient way to do range queries in Impala? For example, I have a column called latitude of type Double. I'd like to find all rows whose latitude is between the range x-5 and x+5, where x is a double. I can scan all rows, but that's O(n). Is there a more efficient way, one that can give results in real-time with a billion rows? Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:19:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/23759#M4636</guid>
      <dc:creator>MickeyMouse</dc:creator>
      <dc:date>2022-09-16T09:19:19Z</dc:date>
    </item>
    <item>
      <title>Re: Range queries in Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/23799#M4637</link>
      <description>&lt;P&gt;Hi, I'd like to make the question more specific. Here's my use-case:&lt;/P&gt;&lt;P&gt;I have to store&amp;nbsp; a huge number of latitute (lat) and longitude (long) values. And i have to find the nearest lats and longs to a given lat-long. Is there an efficient way to do this in Impala? Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Jan 2015 18:48:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/23799#M4637</guid>
      <dc:creator>MickeyMouse</dc:creator>
      <dc:date>2015-01-16T18:48:28Z</dc:date>
    </item>
    <item>
      <title>Re: Range queries in Impala</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/24258#M4638</link>
      <description>&lt;P&gt;Hi MickeyMouse,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;my understanding is that you are looking to answer k-neirest neighbour (kNN) queries, i.e., given a query lat/long, find the k nearest lat/long in the dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You'd be able to answer such queries in Impala by tranforming the kNN query into a swries of range queries (keep increasing the range until you've found at least k answers).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;One way to make the range queries effficient could be to partition your table&amp;nbsp;based on lat and/or long. Of course, there are many distinct lat/long values, so you'd probably need to create buckets in the space of lat/long values (e.g. a grid structure). Then you'd need to transform the original lat/long values given in the query into the grid space. This way Impala's partition pruning will ick in and you'd be restricted to searcing data in those grid cells that overlap with the specified range. Just a high-level idea, hope it makes sense.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm afraid there is no easy and efficient wayto directly answer kNN queries in Impala today.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alex&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Jan 2015 00:51:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Range-queries-in-Impala/m-p/24258#M4638</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2015-01-31T00:51:58Z</dc:date>
    </item>
  </channel>
</rss>

