About sbouguerra

sbouguerra · ‎11-07-2018

Correct @Matthieu Lamairesse Druid is case sensitive while Hive is not, thus, to make it work you need to make sure that all the columns are in lowercase format.

sbouguerra · ‎11-06-2018

Looked at the code and seems like the current state of the art the timestamp column is hard coded to be __time, thus that is why you are getting the exceptions since your column is called `timestamp`. https://github.com/apache/hive/blob/a51e6aeaf816bdeea5e91ba3a0fab8a31b3a496d/druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java#L301 If this is the case this is a serious limitation and need to be fixed. @Nishant Bangarwa what you think?

sbouguerra · ‎08-31-2018

If you are interested on rolling up SUM and Count then you output raw data as is and "druid.query.granularity"="HOUR" (FYI not segment granularity) will do the rollup for you. If you want to compute other rollups metrics like using MIN/MAX/AVG etc then you need to do the rollup before. If you share an example of your use cases i can help explaining more. Thanks.

sbouguerra · ‎08-31-2018

Please take a look at this page HDP-2.6-Hive-Druid How much pre-processing is needed on the hive table creation? should i "clean" the data such that there is not further aggregation on druid or is the granularity settings will take care of aggregation on the druid side? and if so, where should those aggregations be defined? (so if i want "HOUR" granularity should i pre-process the table to group by the hours already and do all the aggregations within Hive)? Use "druid.query.granularity" = "HOUR" Is there any support for "HyperUnique" in this workflow? looking on doing something like "unique user ids" sorry NO complex metrics One of my challenges is that new metrics are added at a weekly/monthly basis. How will i support that if i need to load the data daily into druid? How would you handle schema evolution? Use Insert Into for newer data and Alter table to add new columns.

sbouguerra · ‎05-02-2018

use Inset into statement create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float); insert into test_table values ('2015-01-08 00:00:00', 'i1-start', 4); CREATE TABLE druid_table (`__time` timestamp, `userid` string, `num_l` float) STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ("druid.segment.granularity" = "DAY"); INSERT INTO TABLE druid_table select cast(`timecolumn` as timestamp) as `__time`, `userid`, `num_l` FROM test_table;

sbouguerra · ‎05-02-2018

It is coming from Hive conf as default. You can set it from Ambari to make it global for all sessions hive.druid.storage.storageDirectory https://github.com/b-slim/hive/blob/f8bc4868eced2ca83113579b626e279bbe6d5b13/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2626

sbouguerra · ‎04-28-2018

@ln Chari the properties you need are set hive.druid.metadata.db.type=postgresql; set hive.druid.metadata.username=${DRUID_USERNAME}; set hive.druid.metadata.password=${DRUID_PASSWORD}; set hive.druid.metadata.uri=jdbc:postgresql://${METADATA_DRUID_HOST}:5432/druid; You were missing hive.druid.metadata.db.type Also make sure that ${DRUID_USERNAME} are replaced with actual values You do not need all the properties starting with druid.*

sbouguerra · ‎04-03-2018

In this case increasing the Ulimit of number of processes to the Druid user will fix this issue. This Article explains more the issue and best way to fix such issue. Exception stack. 2018-04-02T23:41:56,827 ERROR [main] io.druid.cli.CliBroker-Errorwhen starting up.Failing.java.lang.OutOfMemoryError: unable to create newnative thread at java.lang.Thread.start0(NativeMethod)~[?:1.8.0_40]at java.lang.Thread.start(Thread.java:714)~[?:1.8.0_40]at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)~[?:1.8.0_40]at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)~[?:1.8.0_40]at org.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:368)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorkerPool.newWorker(NioWorkerPool.java:44)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorkerPool.newWorker(NioWorkerPool.java:28)

sbouguerra · ‎03-21-2018

I think your classpath is missing the HDFS module that is under extensions directory...

sbouguerra · ‎03-06-2018

i believe HS2 logs are under /var/log/hive/hiveserver2.log

Online	Offline
Last Visited	‎09-07-2019 05:22 AM

Member Since	‎08-05-2016 05:14 PM
Last Visited	‎09-07-2019 05:22 AM
Posts	76
Kudos received	10

Cloudera Community

Re: Druid kafka ingestion from Hive - HDP 3.0

Re: Hive to Druid Methodology

Re: Druid on Hive LLAP - HDP2.6.1

Re: Druid integration with Hive LLAP

Re: Druid 0.10.1 on Ambari

Re: Druid kafka ingestion from Hive - HDP 3.0

Re: Druid kafka ingestion from Hive - HDP 3.0

Re: Hive to Druid Methodology

Re: Hive to Druid Methodology

Re: How to load delta data on daily basis into Hiv...

Re: Druid on Hive LLAP - HDP2.6.1

Re: Druid integration with Hive LLAP

Druid node failing with OOM "java.lang.OutOfMemory...

Re: What would be the right command to start Druid...

Re: How to debug Hive2 beeline ?