Created 07-20-2016 06:13 PM
I am looking for the best option for in-memory computing, fast data. The most recent data we have (current, 5 minutes, 1 hours, < 1 day) we need to have access to as fast as possible.
It's probably 500G or less.
Something like Pivotal's Butterfly Architecture.
What will work best for keeping some of this fast data? I have been looking at Apache Geode, Apache Ignite, Alluxio, SnappyData, Redis, HDFS Ram Data Nodes, HBase In-Memory Column Families, Kafka, Spark Streaming.
Any baked solutions out there that work with HDP?
Created 08-15-2016 10:14 PM
It really all depends on your particular use case and requirements. First, I'm assuming you have a custom-built application that will be querying this data store. If so, how complex do the queries need to be? Do you need Relational (SQL) or Key-Value store? Also, how much latency can you afford?
I would first explore if HBase (or HBase + Phoenix) would be sufficient. This will reduce the number of moving parts you have.
If you're set on in-memory data grids/stores then some options would be Redis, Hazelcast, Teracotta Big Memory and GridGain (Apache Ignite). I believe the last two have connectors to Hadoop that allow writing results of MR jobs directly to the data grid (you'll need to confirm that functionality though)
Like I said before though, I recommend you exhaust the HBase option before moving out-of-stack.
Created 08-15-2016 10:14 PM
It really all depends on your particular use case and requirements. First, I'm assuming you have a custom-built application that will be querying this data store. If so, how complex do the queries need to be? Do you need Relational (SQL) or Key-Value store? Also, how much latency can you afford?
I would first explore if HBase (or HBase + Phoenix) would be sufficient. This will reduce the number of moving parts you have.
If you're set on in-memory data grids/stores then some options would be Redis, Hazelcast, Teracotta Big Memory and GridGain (Apache Ignite). I believe the last two have connectors to Hadoop that allow writing results of MR jobs directly to the data grid (you'll need to confirm that functionality though)
Like I said before though, I recommend you exhaust the HBase option before moving out-of-stack.