Member since
08-05-2016
76
Posts
10
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1692 | 11-06-2018 08:19 PM | |
1101 | 08-31-2018 05:34 PM | |
843 | 05-02-2018 02:21 PM | |
1242 | 04-28-2018 09:32 PM | |
1253 | 11-09-2017 06:02 PM |
02-09-2019
02:00 AM
1 Kudo
I bet your column names have upper cases in Druid tables, thus Hive will query the wrong columns since it is always down-casing the names
... View more
11-12-2018
07:37 PM
Can you please explain what you mean with rebuild ? What is the Source of truth ? is it an append? is it removing some rows ? Can you please add an example ?
... View more
11-09-2018
12:52 AM
You see that exception because MySql is the default metadata Driver for Druid-Hive integration. you need to set hive.druid.metadata.db.type=derby Also i want to make 2 points. First Derby is only used for integration testing and will only work on one host and did not get any chance to test is outside of that scope. Second Please keep in mind that Hive is not case sensitive and lowers all your columns name while Druid is case sensitive thus recommend to lower case all the column names.
... View more
11-07-2018
07:07 PM
Correct @Matthieu Lamairesse Druid is case sensitive while Hive is not, thus, to make it work you need to make sure that all the columns are in lowercase format.
... View more
11-07-2018
03:47 PM
you can set the kerberos credentials as part of the consumer properties, look at this thread and let me know if it works for you. https://groups.google.com/forum/#!topic/druid-user/W2SiPnNsy0U
... View more
11-07-2018
03:42 PM
Druid Hive Handler is not using MR, but it is using whatever Hive is using as execution engine eg TEZ.
... View more
11-06-2018
08:19 PM
Looked at the code and seems like the current state of the art the timestamp column is hard coded to be __time, thus that is why you are getting the exceptions since your column is called `timestamp`. https://github.com/apache/hive/blob/a51e6aeaf816bdeea5e91ba3a0fab8a31b3a496d/druid-handler/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java#L301 If this is the case this is a serious limitation and need to be fixed. @Nishant Bangarwa what you think?
... View more
11-06-2018
07:34 PM
This is a known bug, when you try to create an empty table. Quick fix will be to Create the table and insert data in the same time using Create table as Select statement. Or update to version with the actual bug fix https://issues.apache.org/jira/browse/HIVE-16677
... View more
10-17-2018
04:08 PM
@Nishant Bangarwa can you please take a look at this?
... View more
09-26-2018
01:31 AM
Can you please add the Logs of Coordinator and Overlord plus the Task?
... View more
08-31-2018
05:57 PM
If you are interested on rolling up SUM and Count then you output raw data as is and "druid.query.granularity"="HOUR" (FYI not segment granularity) will do the rollup for you. If you want to compute other rollups metrics like using MIN/MAX/AVG etc then you need to do the rollup before. If you share an example of your use cases i can help explaining more. Thanks.
... View more
08-31-2018
05:38 PM
seems like having issue connecting to jdbc:mysql://${DRUID_HOST}/druid; are you sure the values of DRUID_HOST is the actual hostname port of Mysql DB? Can you share the logs of HS2?
... View more
08-31-2018
05:34 PM
1 Kudo
Please take a look at this page HDP-2.6-Hive-Druid
How much pre-processing is needed on the hive table creation? should i "clean" the data such that there is not further aggregation on druid or is the granularity settings will take care of aggregation on the druid side? and if so, where should those aggregations be defined? (so if i want "HOUR" granularity should i pre-process the table to group by the hours already and do all the aggregations within Hive)? Use "druid.query.granularity" = "HOUR"
Is there any support for "HyperUnique" in this workflow? looking on doing something like "unique user ids" sorry NO complex metrics
One of my challenges is that new metrics are added at a weekly/monthly basis. How will i support that if i need to load the data daily into druid? How would you handle schema evolution? Use Insert Into for newer data and Alter table to add new columns.
... View more
05-07-2018
05:02 PM
This is a fixed bug looks like you older version fix is here https://issues.apache.org/jira/browse/HIVE-16677 The workaround is to use a CTAS insert at least one row.
... View more
05-02-2018
02:29 PM
The speed at which the indexes are generated depends on Hive LLAP workers not Druid it self. You might try to change the "druid.segment.granularity" = "MONTH" to "DAY" that will give you more parallelism thus might run faster if LLAP has enough resources.
... View more
05-02-2018
02:25 PM
use Inset into statement create table test_table(`timecolumn` timestamp, `userid` string, `num_l` float);
insert into test_table values ('2015-01-08 00:00:00', 'i1-start', 4);
CREATE TABLE druid_table (`__time` timestamp, `userid` string, `num_l` float)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.segment.granularity" = "DAY");
INSERT INTO TABLE druid_table
select cast(`timecolumn` as timestamp) as `__time`, `userid`, `num_l` FROM test_table;
... View more
05-02-2018
02:21 PM
It is coming from Hive conf as default. You can set it from Ambari to make it global for all sessions hive.druid.storage.storageDirectory https://github.com/b-slim/hive/blob/f8bc4868eced2ca83113579b626e279bbe6d5b13/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2626
... View more
04-28-2018
09:32 PM
@ln Chari the properties you need are set hive.druid.metadata.db.type=postgresql;
set hive.druid.metadata.username=${DRUID_USERNAME};
set hive.druid.metadata.password=${DRUID_PASSWORD};
set hive.druid.metadata.uri=jdbc:postgresql://${METADATA_DRUID_HOST}:5432/druid;
You were missing hive.druid.metadata.db.type Also make sure that ${DRUID_USERNAME} are replaced with actual values You do not need all the properties starting with druid.*
... View more
04-03-2018
11:30 PM
In this case increasing the Ulimit of number of processes to the Druid user will fix this issue. This Article explains more the issue and best way to fix such issue. Exception stack. 2018-04-02T23:41:56,827 ERROR [main] io.druid.cli.CliBroker-Errorwhen starting up.Failing.java.lang.OutOfMemoryError: unable to create newnative thread at java.lang.Thread.start0(NativeMethod)~[?:1.8.0_40]at java.lang.Thread.start(Thread.java:714)~[?:1.8.0_40]at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)~[?:1.8.0_40]at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)~[?:1.8.0_40]at org.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:368)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:100)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:52)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorkerPool.newWorker(NioWorkerPool.java:44)~[netty-3.10.6.Final.jar:?]at org.jboss.netty.channel.socket.nio.NioWorkerPool.newWorker(NioWorkerPool.java:28)
... View more
03-21-2018
03:03 AM
I think your classpath is missing the HDFS module that is under extensions directory...
... View more
03-01-2018
01:48 AM
do you see this only with druid indexing jobs ?
... View more
02-27-2018
02:30 AM
here is a step by step tutorial http://druid.io/docs/0.11.0/tutorials/tutorial-kafka.html
... View more
02-27-2018
02:28 AM
If this is the first run then it takes some time for Superset to discover the data (it has to wait for the first handoff data going from real-time nodes to historicals) then it should work. I guess the default handoff window is 15 mins
... View more
12-22-2017
02:55 PM
Hi,
Thanks for trying out this new Hive Feature.
The Scan query was very recently released as part of the core Druid 0.11.0 core engine.
Yes we are working on the integration of scan query, stage one is done here https://github.com/apache/calcite/pull/577 and stage 2 will follow shortly https://issues.apache.org/jira/browse/HIVE-17627.
We are also working on pushing more computations down to Druid, as you can see here https://github.com/apache/calcite/pulls?utf8=%E2%9C%93&q=druid and https://issues.apache.org/jira/browse/HIVE/component/12330863/?selectedTab=com.atlassian.jira.jira-projects-plugin:component-summary-panel.
For the question
In Tableau I see there is way to define custom queries ,but this more of an extract rather than a live connection.Is there a better for live connections in Hive using custom queries?
Am not really sure if there is a better way, i hope that Tableau experts can answer this. @Carter Shanklin any chance you can dive in?
... View more
12-05-2017
10:50 PM
can you create a new issue and attach the logs. it is hard to see what is going on without logs. One way to check this is to first issue an explain query that will show you the druid query then you can copy that query and try it your self via curl command.
... View more
11-30-2017
06:00 PM
1 Kudo
FYI Derby is a local instance DB used only for testing. For production please use Mysql or Postgres.
... View more
11-27-2017
04:03 PM
Most of those CLIs you are referring to are deprecated. Please refer to this post to read about the differences https://community.hortonworks.com/questions/135182/hive-cli-vs-beeline.html
... View more
11-27-2017
03:57 PM
It all depends on how you are building your schema but it is doable. IMO depending on the cardinality of your customer base and/or the number of products. Also you might be able to use Sketches if approximation is okay for your use cases.
... View more
11-24-2017
03:55 PM
This is related to this question. https://community.hortonworks.com/questions/102905/hive-druid-handler-javalangnoclassdeffounderror-or.html
... View more