Support Questions

Find answers, ask questions, and share your expertise

In-Memory Layer

avatar
Master Guru

I am looking for the best option for in-memory computing, fast data. The most recent data we have (current, 5 minutes, 1 hours, < 1 day) we need to have access to as fast as possible.

It's probably 500G or less.

Something like Pivotal's Butterfly Architecture.

What will work best for keeping some of this fast data? I have been looking at Apache Geode, Apache Ignite, Alluxio, SnappyData, Redis, HDFS Ram Data Nodes, HBase In-Memory Column Families, Kafka, Spark Streaming.

Any baked solutions out there that work with HDP?

1 ACCEPTED SOLUTION

avatar

Hi @Timothy Spann

It really all depends on your particular use case and requirements. First, I'm assuming you have a custom-built application that will be querying this data store. If so, how complex do the queries need to be? Do you need Relational (SQL) or Key-Value store? Also, how much latency can you afford?

I would first explore if HBase (or HBase + Phoenix) would be sufficient. This will reduce the number of moving parts you have.

If you're set on in-memory data grids/stores then some options would be Redis, Hazelcast, Teracotta Big Memory and GridGain (Apache Ignite). I believe the last two have connectors to Hadoop that allow writing results of MR jobs directly to the data grid (you'll need to confirm that functionality though)

Like I said before though, I recommend you exhaust the HBase option before moving out-of-stack.

View solution in original post

1 REPLY 1

avatar

Hi @Timothy Spann

It really all depends on your particular use case and requirements. First, I'm assuming you have a custom-built application that will be querying this data store. If so, how complex do the queries need to be? Do you need Relational (SQL) or Key-Value store? Also, how much latency can you afford?

I would first explore if HBase (or HBase + Phoenix) would be sufficient. This will reduce the number of moving parts you have.

If you're set on in-memory data grids/stores then some options would be Redis, Hazelcast, Teracotta Big Memory and GridGain (Apache Ignite). I believe the last two have connectors to Hadoop that allow writing results of MR jobs directly to the data grid (you'll need to confirm that functionality though)

Like I said before though, I recommend you exhaust the HBase option before moving out-of-stack.