Created 06-09-2016 08:37 PM
Where to store a 672 dimension, million record dataset for online applications?
How would you store it, lay it out. Most queries are working with smaller subsets of the dimensions, say 20-30 at a time.
HBase or HBase + Phoenix has been thought of?
Or would Hive + Tez + ORC work well.
Should it be cached like Apache Ignite? Apache Geode? Redis?
Any suggestions? Looking for best practices for a greenfield application.
Created 11-22-2017 04:45 PM
Druid is another option and is supported in HDP 2.6
Created 06-09-2016 08:54 PM
Have you looked at Apache Kylin (which is built on top of hbase) ?
Created 06-09-2016 08:59 PM
Created 06-10-2016 02:49 AM
is Kylin a support part of the HDP 2.4 stack?
Created 11-22-2017 04:45 PM
Druid is another option and is supported in HDP 2.6