Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

In-Memory Layer

avatar
Master Guru

I am looking for the best option for in-memory computing, fast data. The most recent data we have (current, 5 minutes, 1 hours, < 1 day) we need to have access to as fast as possible.

It's probably 500G or less.

Something like Pivotal's Butterfly Architecture.

What will work best for keeping some of this fast data? I have been looking at Apache Geode, Apache Ignite, Alluxio, SnappyData, Redis, HDFS Ram Data Nodes, HBase In-Memory Column Families, Kafka, Spark Streaming.

Any baked solutions out there that work with HDP?

1 ACCEPTED SOLUTION

avatar

Hi @Timothy Spann

It really all depends on your particular use case and requirements. First, I'm assuming you have a custom-built application that will be querying this data store. If so, how complex do the queries need to be? Do you need Relational (SQL) or Key-Value store? Also, how much latency can you afford?

I would first explore if HBase (or HBase + Phoenix) would be sufficient. This will reduce the number of moving parts you have.

If you're set on in-memory data grids/stores then some options would be Redis, Hazelcast, Teracotta Big Memory and GridGain (Apache Ignite). I believe the last two have connectors to Hadoop that allow writing results of MR jobs directly to the data grid (you'll need to confirm that functionality though)

Like I said before though, I recommend you exhaust the HBase option before moving out-of-stack.

View solution in original post

1 REPLY 1

avatar

Hi @Timothy Spann

It really all depends on your particular use case and requirements. First, I'm assuming you have a custom-built application that will be querying this data store. If so, how complex do the queries need to be? Do you need Relational (SQL) or Key-Value store? Also, how much latency can you afford?

I would first explore if HBase (or HBase + Phoenix) would be sufficient. This will reduce the number of moving parts you have.

If you're set on in-memory data grids/stores then some options would be Redis, Hazelcast, Teracotta Big Memory and GridGain (Apache Ignite). I believe the last two have connectors to Hadoop that allow writing results of MR jobs directly to the data grid (you'll need to confirm that functionality though)

Like I said before though, I recommend you exhaust the HBase option before moving out-of-stack.