Member since
08-05-2016
76
Posts
10
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2763 | 11-06-2018 08:19 PM | |
1995 | 08-31-2018 05:34 PM | |
1473 | 05-02-2018 02:21 PM | |
2163 | 04-28-2018 09:32 PM | |
2385 | 11-09-2017 06:02 PM |
11-07-2018
07:07 PM
Correct @Matthieu Lamairesse Druid is case sensitive while Hive is not, thus, to make it work you need to make sure that all the columns are in lowercase format.
... View more
11-06-2018
08:19 PM
Looked at the code and seems like the current state of the art the timestamp column is hard coded to be __time, thus that is why you are getting the exceptions since your column is called `timestamp`. https://github.com/apache/hive/blob/a51e6aeaf816bdeea5e91ba3a0fab8a31b3a496d/druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java#L301 If this is the case this is a serious limitation and need to be fixed. @Nishant Bangarwa what you think?
... View more
08-31-2018
05:57 PM
If you are interested on rolling up SUM and Count then you output raw data as is and "druid.query.granularity"="HOUR" (FYI not segment granularity) will do the rollup for you. If you want to compute other rollups metrics like using MIN/MAX/AVG etc then you need to do the rollup before. If you share an example of your use cases i can help explaining more. Thanks.
... View more
08-31-2018
05:34 PM
1 Kudo
Please take a look at this page HDP-2.6-Hive-Druid
How much pre-processing is needed on the hive table creation? should i "clean" the data such that there is not further aggregation on druid or is the granularity settings will take care of aggregation on the druid side? and if so, where should those aggregations be defined? (so if i want "HOUR" granularity should i pre-process the table to group by the hours already and do all the aggregations within Hive)? Use "druid.query.granularity" = "HOUR"
Is there any support for "HyperUnique" in this workflow? looking on doing something like "unique user ids" sorry NO complex metrics
One of my challenges is that new metrics are added at a weekly/monthly basis. How will i support that if i need to load the data daily into druid? How would you handle schema evolution? Use Insert Into for newer data and Alter table to add new columns.
... View more
05-02-2018
02:25 PM
use Inset into statement create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float);
insert into test_table values ('2015-01-08 00:00:00', 'i1-start', 4);
CREATE TABLE druid_table (`__time` timestamp, `userid` string, `num_l` float)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY");
INSERT INTO TABLE druid_table
select cast(`timecolumn` as timestamp) as `__time`, `userid`, `num_l` FROM test_table;
... View more
05-02-2018
02:21 PM
It is coming from Hive conf as default. You can set it from Ambari to make it global for all sessions hive.druid.storage.storageDirectory https://github.com/b-slim/hive/blob/f8bc4868eced2ca83113579b626e279bbe6d5b13/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2626
... View more
04-28-2018
09:32 PM
@ln Chari the properties you need are set hive.druid.metadata.db.type=postgresql;
set hive.druid.metadata.username=${DRUID_USERNAME};
set hive.druid.metadata.password=${DRUID_PASSWORD};
set hive.druid.metadata.uri=jdbc:postgresql://${METADATA_DRUID_HOST}:5432/druid;
You were missing hive.druid.metadata.db.type Also make sure that ${DRUID_USERNAME} are replaced with actual values You do not need all the properties starting with druid.*
... View more
04-03-2018
11:30 PM
In this case increasing the Ulimit of number of processes to the Druid user will fix this issue. This Article explains more the issue and best way to fix such issue. Exception stack. 2018-04-02T23:41:56,827 ERROR [main] io.druid.cli.CliBroker-Errorwhen starting up.Failing.java.lang.OutOfMemoryError: unable to create newnative thread at java.lang.Thread.start0(NativeMethod)~[?:1.8.0_40]at java.lang.Thread.start(Thread.java:714)~[?:1.8.0_40]at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)~[?:1.8.0_40]at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)~[?:1.8.0_40]at org.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:368)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorkerPool.newWorker(NioWorkerPool.java:44)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorkerPool.newWorker(NioWorkerPool.java:28)
... View more
03-21-2018
03:03 AM
I think your classpath is missing the HDFS module that is under extensions directory...
... View more