<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Druid Hive combination for 12TB+ Dataset (OLAP Usecase) in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211645#M69500</link>
    <description>&lt;P style="margin-left: 20px;"&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/30181/roshand.html" nodeid="30181"&gt;@Roshan Dissanayake&lt;/A&gt; &lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;The integration is production ready, we are planning on GA version HDP 2.6.3 which going to be released soon.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;To answer your question about performance, i don't think the data size is an issue since Druid/LLAP can scale horizontally. The real question is how much of your query can be pushed to the druid cluster. This might require rethinking the schema of the OLAP Cubes and maybe rewrite some of the queries.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;I will be happy to help you with that if you can share the queries and schema.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;  &lt;/P&gt;</description>
    <pubDate>Fri, 13 Oct 2017 00:47:22 GMT</pubDate>
    <dc:creator>sbouguerra</dc:creator>
    <dc:date>2017-10-13T00:47:22Z</dc:date>
    <item>
      <title>Druid Hive combination for 12TB+ Dataset (OLAP Usecase)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211644#M69499</link>
      <description>&lt;P&gt;Referring to this article &lt;A href="https://hortonworks.com/blog/apache-hive-druid-part-1-3/" target="_blank"&gt;https://hortonworks.com/blog/apache-hive-druid-part-1-3/&lt;/A&gt; by &lt;A rel="user" href="https://community.cloudera.com/users/982/carters.html" nodeid="982" target="_blank"&gt;@Carter Shanklin&lt;/A&gt;&lt;/P&gt;&lt;P&gt;We have a 12TB+ (growing ~8GB per day) data set of click stream data(user events) in Hive.&lt;/P&gt;&lt;P&gt;The usecase is to run OLAP queries across the data set, for now mostly groupby. &lt;/P&gt;&lt;P&gt;How will this combination perform in context to the data set ? &lt;/P&gt;&lt;P&gt;Also how production ready is the combination. &lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:23:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211644#M69499</guid>
      <dc:creator>roshand</dc:creator>
      <dc:date>2022-09-16T12:23:29Z</dc:date>
    </item>
    <item>
      <title>Re: Druid Hive combination for 12TB+ Dataset (OLAP Usecase)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211645#M69500</link>
      <description>&lt;P style="margin-left: 20px;"&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/30181/roshand.html" nodeid="30181"&gt;@Roshan Dissanayake&lt;/A&gt; &lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;The integration is production ready, we are planning on GA version HDP 2.6.3 which going to be released soon.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;To answer your question about performance, i don't think the data size is an issue since Druid/LLAP can scale horizontally. The real question is how much of your query can be pushed to the druid cluster. This might require rethinking the schema of the OLAP Cubes and maybe rewrite some of the queries.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;I will be happy to help you with that if you can share the queries and schema.&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;  &lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 00:47:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211645#M69500</guid>
      <dc:creator>sbouguerra</dc:creator>
      <dc:date>2017-10-13T00:47:22Z</dc:date>
    </item>
    <item>
      <title>Re: Druid Hive combination for 12TB+ Dataset (OLAP Usecase)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211646#M69501</link>
      <description>&lt;P&gt;Thanks a lot for the quick reply.&lt;/P&gt;&lt;P&gt;Let me do a setup with a smaller data set and get back to you with some questions. &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2017 13:42:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211646#M69501</guid>
      <dc:creator>roshand</dc:creator>
      <dc:date>2017-10-13T13:42:54Z</dc:date>
    </item>
    <item>
      <title>Re: Druid Hive combination for 12TB+ Dataset (OLAP Usecase)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211647#M69502</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12341/sbouguerra.html" nodeid="12341"&gt;@Slim&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Given that this dataset already loaded into HIVE, and the Hive table will be updated occasionally*. What are my chances in using druid to index this data and use superset to visualise the data(Without replicating in druid) ? And how would you recommend this approach ? &lt;/P&gt;&lt;P&gt;*Will druid automatically update indexes when data is added to HIVE? &lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 12:10:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211647#M69502</guid>
      <dc:creator>roshand</dc:creator>
      <dc:date>2017-10-23T12:10:06Z</dc:date>
    </item>
    <item>
      <title>Re: Druid Hive combination for 12TB+ Dataset (OLAP Usecase)</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211648#M69503</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/30181/roshand.html" nodeid="30181"&gt;@Roshan Dissanayake&lt;/A&gt;  &lt;/P&gt;&lt;P&gt;1- keep in mind that the indexes have some extra overhead thus technically speaking some part of the data will be replicated but in a form of index thus compressed and more concise.&lt;/P&gt;&lt;P&gt;2- Hive will not manage the lifecycle of druid indexes you need to setup some Ozie (or any another workflow manager) to do the create table / insert into statements or drop table to keep the indexes up to date.&lt;/P&gt;&lt;P&gt;3- on a side not sure how the updates lands in your hive system, but if your pattern is mostly append/insert over a period of time then druid is designed for that usecase since data will be partition using time column. &lt;/P&gt;</description>
      <pubDate>Mon, 23 Oct 2017 20:23:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Druid-Hive-combination-for-12TB-Dataset-OLAP-Usecase/m-p/211648#M69503</guid>
      <dc:creator>sbouguerra</dc:creator>
      <dc:date>2017-10-23T20:23:01Z</dc:date>
    </item>
  </channel>
</rss>

