Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Optimal Location to Store for Interactive Retrieval in a Modern Data Web Application (and also ad-hoc reporting)

avatar
Master Guru

Where to store a 672 dimension, million record dataset for online applications?

How would you store it, lay it out. Most queries are working with smaller subsets of the dimensions, say 20-30 at a time.

HBase or HBase + Phoenix has been thought of?

Or would Hive + Tez + ORC work well.

Should it be cached like Apache Ignite? Apache Geode? Redis?

Any suggestions? Looking for best practices for a greenfield application.

1 ACCEPTED SOLUTION

avatar
Master Guru

Druid is another option and is supported in HDP 2.6

View solution in original post

4 REPLIES 4

avatar
Master Collaborator

Have you looked at Apache Kylin (which is built on top of hbase) ?

http://kylin.apache.org/

avatar
Master Collaborator

avatar
Master Guru

is Kylin a support part of the HDP 2.4 stack?

avatar
Master Guru

Druid is another option and is supported in HDP 2.6