About shaharc

shaharc · ‎08-31-2018

Thanks @Slim! I went over the doc, but im still not 100% sure on the relationship between the segment granularity and the output table. again shuld i be aggregating within the hive query at all? or can i output "raw" data and the granularity setting will aggregate for me (and if so, will it support sum/count etc out of the box?)

shaharc · ‎08-31-2018

I'm looking into loading existing Hive tables stored as parquet to Druid so i wonder about a few things: I thought about doing it without hive directly from Druid but seems like it does not support nested parquet objects. Has anyone had the same issue? How much pre-processing is needed on the hive table creation? should i "clean" the data such that there is not further aggregation on druid or is the granularity settings will take care of aggregation on the druid side? and if so, where should those aggregations be defined? (so if i want "HOUR" granularity should i pre-process the table to group by the hours already and do all the aggregations within Hive)? Is there any support for "HyperUnique" in this workflow? looking on doing something like "unique user ids" One of my challenges is that new metrics are added at a weekly/monthly basis. How will i support that if i need to load the data daily into druid? How would you handle schema evolution? I haven't found a documentation for all the different druid configurations from Hive (such as "hive.druid.broker.address.default"). Do you mind point me to it? Thanks! Shahar

Online	Offline
Last Visited	‎09-05-2018 05:50 PM

Member Since	‎08-31-2018 05:01 PM
Last Visited	‎09-05-2018 05:50 PM
Posts	4

Cloudera Community

Re: Hive to Druid Methodology

Hive to Druid Methodology