Archives of Support Questions (Read Only)

TimothySpann · ‎06-09-2016

Where to store a 672 dimension, million record dataset for online applications?

How would you store it, lay it out. Most queries are working with smaller subsets of the dimensions, say 20-30 at a time.

HBase or HBase + Phoenix has been thought of?

Or would Hive + Tez + ORC work well.

Should it be cached like Apache Ignite? Apache Geode? Redis?

Any suggestions? Looking for best practices for a greenfield application.

TimothySpann · ‎11-22-2017

Druid is another option and is supported in HDP 2.6

tyu · ‎06-09-2016

Have you looked at Apache Kylin (which is built on top of hbase) ?

tyu · ‎06-09-2016

See also

TimothySpann · ‎06-10-2016

is Kylin a support part of the HDP 2.4 stack?

TimothySpann · ‎11-22-2017

Druid is another option and is supported in HDP 2.6

Optimal Location to Store for Interactive Retrieval in a Modern Data Web Application (and also ad-hoc reporting)