Support Questions

Find answers, ask questions, and share your expertise

Optimal Location to Store for Interactive Retrieval in a Modern Data Web Application (and also ad-hoc reporting)

avatar
Master Guru

Where to store a 672 dimension, million record dataset for online applications?

How would you store it, lay it out. Most queries are working with smaller subsets of the dimensions, say 20-30 at a time.

HBase or HBase + Phoenix has been thought of?

Or would Hive + Tez + ORC work well.

Should it be cached like Apache Ignite? Apache Geode? Redis?

Any suggestions? Looking for best practices for a greenfield application.

1 ACCEPTED SOLUTION

avatar
Master Guru

Druid is another option and is supported in HDP 2.6

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Have you looked at Apache Kylin (which is built on top of hbase) ?

http://kylin.apache.org/

avatar
Master Collaborator

avatar
Master Guru

is Kylin a support part of the HDP 2.4 stack?

avatar
Master Guru

Druid is another option and is supported in HDP 2.6