About sbouguerra

sbouguerra · ‎12-05-2017

can you create a new issue and attach the logs. it is hard to see what is going on without logs. One way to check this is to first issue an explain query that will show you the druid query then you can copy that query and try it your self via curl command.

sbouguerra · ‎11-30-2017

FYI Derby is a local instance DB used only for testing. For production please use Mysql or Postgres.

sbouguerra · ‎11-09-2017

Concerning the DIY I cannot provide an exact answer but it is more a case by case I guess. From Druid perspective generally speaking Druid only uses Ambari to manage configs, therefore, it is doable to drop in some jars or update the HDP package only but this needs some level of testing to make sure nothing is breaking between versions.

sbouguerra · ‎11-09-2017

current distribution HDP 2.6.3 and Ambari 2.6 has druid 0.10.1 plus tons of patches.

sbouguerra · ‎10-23-2017

@Roshan Dissanayake 1- keep in mind that the indexes have some extra overhead thus technically speaking some part of the data will be replicated but in a form of index thus compressed and more concise. 2- Hive will not manage the lifecycle of druid indexes you need to setup some Ozie (or any another workflow manager) to do the create table / insert into statements or drop table to keep the indexes up to date. 3- on a side not sure how the updates lands in your hive system, but if your pattern is mostly append/insert over a period of time then druid is designed for that usecase since data will be partition using time column.

sbouguerra · ‎10-18-2017

Thanks yes we will update the Wiki. Am not 100%, but I hope this will make it to 2.6.3, we are pushing it to the finish line. Thanks.

sbouguerra · ‎10-18-2017

Sorry currently we only issue Time-series and Group-by queries and it is by design. The reason we backed of TopN is that Druid's TopN algorithm is an approximate thus is not the exact result, that is why we use Group-by all the time. Since we are a SQL shop, we need to make sure that results are correct rather than running fast. Although we are adding a new feature to allow approximate results thus the CBO will use TopN when possible if the approximate flag is turned On. Thanks!

sbouguerra · ‎10-12-2017

Hi @Roshan Dissanayake The integration is production ready, we are planning on GA version HDP 2.6.3 which going to be released soon. To answer your question about performance, i don't think the data size is an issue since Druid/LLAP can scale horizontally. The real question is how much of your query can be pushed to the druid cluster. This might require rethinking the schema of the OLAP Cubes and maybe rewrite some of the queries. I will be happy to help you with that if you can share the queries and schema.

sbouguerra · ‎10-11-2017

Not sure what is the final use case, but one way to do this is: hive> CREATE TABLE foo (bar CHAR(8)); hive> insert into foo values ("00008DAC"); hive> select * from foo; OK 00008DAC

sbouguerra · ‎10-04-2017

can you start a new thread and add more information about the install process and the stack trace ?

Online	Offline
Last Visited	‎09-07-2019 05:22 AM

Member Since	‎08-05-2016 05:14 PM
Last Visited	‎09-07-2019 05:22 AM
Posts	76
Kudos received	10

Cloudera Community

Re: Druid kafka ingestion from Hive - HDP 3.0

Re: Hive to Druid Methodology

Re: Druid on Hive LLAP - HDP2.6.1

Re: Druid integration with Hive LLAP

Re: Druid 0.10.1 on Ambari

Re: Stuck as Hive- DruidIntegration

Re: Stuck as Hive- DruidIntegration

Re: Druid 0.10.1 on Ambari

Re: Druid 0.10.1 on Ambari

Re: Druid Hive combination for 12TB+ Dataset (OLAP...

Re: How to generate a TopN query with a Hive table...

Re: How to generate a TopN query with a Hive table...

Re: Druid Hive combination for 12TB+ Dataset (OLAP...

Re: BYTE data type support in Hive ?

Re: Druid Installation