<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to generate a TopN query with a Hive table linked to a Druid datasource in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223791#M69891</link>
    <description>&lt;P&gt;Thanks yes we will update the Wiki.&lt;/P&gt;&lt;P&gt;Am not 100%, but I hope this will make it to 2.6.3, we are pushing it to the finish line.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
    <pubDate>Wed, 18 Oct 2017 21:11:20 GMT</pubDate>
    <dc:creator>sbouguerra</dc:creator>
    <dc:date>2017-10-18T21:11:20Z</dc:date>
    <item>
      <title>How to generate a TopN query with a Hive table linked to a Druid datasource</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223788#M69888</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Following the Hive documentation (&lt;A href="https://cwiki.apache.org/confluence/display/Hive/Druid+Integration" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/Druid+Integration&lt;/A&gt;) I linked a Hive table to an existing Druid datasource using the DruidStorageHandler. I managed to do select, timeseries and groupBy queries without any trouble but impossible to generate a TopN query. It always falls back on the groupBy.&lt;/P&gt;&lt;P&gt;My datasource (with a DAY granularity) schema is : &lt;/P&gt;&lt;PRE&gt;+------------+------------+--------------------+--+
|  col_name  | data_type  |      comment       |
+------------+------------+--------------------+--+
| __time     | timestamp  | from deserializer  |
| dimension1 | string     | from deserializer  |
| metric1    | bigint     | from deserializer  |
| dimension2 | string     | from deserializer  |
+------------+------------+--------------------+--+
&lt;/PRE&gt;&lt;P&gt;The query I'm running is : &lt;/P&gt;&lt;PRE&gt;SELECT `dimension1`, `floor_day`(`__time`), sum(`metric1`) as s FROM my_db.my_table GROUP BY `dimension1`, `floor_day`(`__time`) ORDER BY s DESC LIMIT 10;&lt;/PRE&gt;&lt;P&gt;If I 'EXPLAIN' it, I have : &lt;/P&gt;&lt;PRE&gt;Plan optimized by CBO.                                                                               Stage-0
    Fetch Operator
      limit:-1
      Select Operator [SEL_1]
        Output:["_col0","_col1","_col2"]
        TableScan [TS_0]
	  Output:["dimension1","floor_day","$f2"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"my_datasource\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"default\",\"dimension\":\"dimension1\"},{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"floor_day\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",\"granularity\":\"day\",\"timeZone\":\"UTC\",\"locale\":\"en-US\"}}],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"$f2\",\"direction\":\"descending\",\"dimensionOrder\":\"numeric\"}]},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"$f2\",\"fieldName\":\"count\"}],\"intervals\":[\"1900-01-01T00:00:00.000/3000-01-01T00:00:00.000\"]}","druid.query.type":"groupBy"}                                                                                   
&lt;/PRE&gt;&lt;P&gt;I'm using HDP-2.6.2.0-205 with Hive 2.1.0 and Druid 0.9.2.&lt;/P&gt;&lt;P&gt;Is there any configuration I'm missing ? Is my query badly written ? I tried to stick to the doc as much as possible.&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:25:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223788#M69888</guid>
      <dc:creator>pierre_gunet</dc:creator>
      <dc:date>2022-09-16T12:25:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to generate a TopN query with a Hive table linked to a Druid datasource</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223789#M69889</link>
      <description>&lt;P&gt;Sorry currently we only issue Time-series and Group-by queries and it is by design. The reason we backed of  TopN is that Druid's TopN algorithm is an approximate thus is not the exact result, that is why we use Group-by all the time. Since we are a SQL shop, we need to make sure that results are correct rather than running fast. Although we are adding a new feature to allow approximate results thus the CBO will use TopN when possible if the approximate flag is turned On. Thanks! &lt;/P&gt;</description>
      <pubDate>Wed, 18 Oct 2017 20:53:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223789#M69889</guid>
      <dc:creator>sbouguerra</dc:creator>
      <dc:date>2017-10-18T20:53:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to generate a TopN query with a Hive table linked to a Druid datasource</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223790#M69890</link>
      <description>&lt;P&gt;I'm impressed to have such a quick reply ! Thanks &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Understood, it's normal that it's falling back for now. However, maybe you should update the Hive documentation to say that TopN queries are currently unsupported ?&lt;/P&gt;&lt;P&gt;Final thing : it's great you're planning to add it with an approximate flag. Maybe it's too far to know, but any idea of when could we expect this feature ?&lt;/P&gt;</description>
      <pubDate>Wed, 18 Oct 2017 21:03:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223790#M69890</guid>
      <dc:creator>pierre_gunet</dc:creator>
      <dc:date>2017-10-18T21:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to generate a TopN query with a Hive table linked to a Druid datasource</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223791#M69891</link>
      <description>&lt;P&gt;Thanks yes we will update the Wiki.&lt;/P&gt;&lt;P&gt;Am not 100%, but I hope this will make it to 2.6.3, we are pushing it to the finish line.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Oct 2017 21:11:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-generate-a-TopN-query-with-a-Hive-table-linked-to-a/m-p/223791#M69891</guid>
      <dc:creator>sbouguerra</dc:creator>
      <dc:date>2017-10-18T21:11:20Z</dc:date>
    </item>
  </channel>
</rss>

