Member since
03-06-2017
6
Posts
2
Kudos Received
0
Solutions
01-29-2018
04:52 PM
Hi, Ok, I'll assume it's still not possible. Thanks for your answer !
... View more
01-25-2018
10:46 AM
1 Kudo
Hello everyone, In this post (from 2016) https://community.hortonworks.com/questions/67436/is-there-a-way-to-run-multiple-hiveserver2-interac.html it is stated that there is no high availability for the hiveserver2 interactive in the Technical Preview, but '"it is to be added". Now that Hive2 is in General Availability, is it supported ? Can we install two hiveserver2 interactive ? If it is not yet the case, could we have a confirmation that it is planned ? And in which version ? Thanks for your help !
... View more
Labels:
- Labels:
-
Apache Hive
12-27-2017
04:39 PM
1 Kudo
Thanks a lot for this article. This is exactly what I needed !
... View more
10-18-2017
02:03 PM
I'm impressed to have such a quick reply ! Thanks 🙂 Understood, it's normal that it's falling back for now. However, maybe you should update the Hive documentation to say that TopN queries are currently unsupported ? Final thing : it's great you're planning to add it with an approximate flag. Maybe it's too far to know, but any idea of when could we expect this feature ?
... View more
10-18-2017
01:44 PM
Hi, Following the Hive documentation (https://cwiki.apache.org/confluence/display/Hive/Druid+Integration) I linked a Hive table to an existing Druid datasource using the DruidStorageHandler. I managed to do select, timeseries and groupBy queries without any trouble but impossible to generate a TopN query. It always falls back on the groupBy. My datasource (with a DAY granularity) schema is : +------------+------------+--------------------+--+
| col_name | data_type | comment |
+------------+------------+--------------------+--+
| __time | timestamp | from deserializer |
| dimension1 | string | from deserializer |
| metric1 | bigint | from deserializer |
| dimension2 | string | from deserializer |
+------------+------------+--------------------+--+
The query I'm running is : SELECT `dimension1`, `floor_day`(`__time`), sum(`metric1`) as s FROM my_db.my_table GROUP BY `dimension1`, `floor_day`(`__time`) ORDER BY s DESC LIMIT 10; If I 'EXPLAIN' it, I have : Plan optimized by CBO. Stage-0
Fetch Operator
limit:-1
Select Operator [SEL_1]
Output:["_col0","_col1","_col2"]
TableScan [TS_0]
Output:["dimension1","floor_day","$f2"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"my_datasource\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"default\",\"dimension\":\"dimension1\"},{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"floor_day\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",\"granularity\":\"day\",\"timeZone\":\"UTC\",\"locale\":\"en-US\"}}],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"$f2\",\"direction\":\"descending\",\"dimensionOrder\":\"numeric\"}]},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"$f2\",\"fieldName\":\"count\"}],\"intervals\":[\"1900-01-01T00:00:00.000/3000-01-01T00:00:00.000\"]}","druid.query.type":"groupBy"}
I'm using HDP-2.6.2.0-205 with Hive 2.1.0 and Druid 0.9.2. Is there any configuration I'm missing ? Is my query badly written ? I tried to stick to the doc as much as possible. Thank you
... View more
Labels:
- Labels:
-
Apache Hive
03-06-2017
10:12 AM
Hi, I recently started to use the Capacity Scheduler. Basically I have two main queues : dev and prod. Each of them have a capacity of 50% and a max capacity of 100%. My dev queue is in Fair policy while my prod queue is in FIFO. I also set the user-limit-factor to 2 (so each user could use 100% of the cluster of available). This configuration is now in production and I observed something surprising : even if the dev queue is unused (0%) and the prod queue is below 100 %, there is still preemption of containers in the prod queue. - Can preemption happen even if the cluster is not 100% used ? - Can jobs preempt containers from the same queue ? Now I disabled the preemption of the prod queue because there was too much preemption, but I'm loosing some elasticity between the dev and prod 😞 Thanks for you help Pierre
... View more
Labels:
- Labels:
-
Apache YARN