Created on 10-23-2014 01:25 AM - edited 09-16-2022 02:10 AM
I am reading a lot lately about the Lambda Architecture paradigm from Nathan Marz.
Since CDH is perfect for the Batch Layer of such an architecture I was thinkning if it may be possible to save the precomputed views from Hadoop into Cassandra. How is it going to work?
The idea I'm evaluating is: have Hadoop for the batch layer, Storm for the speed layer and save the batch views and the real-time views in Cassandra. Then, the client can query Cassandra directly(for example using Facebook's Presto). I want to know if something like this could be possible.
Created 10-29-2014 10:04 AM
Created 10-23-2014 01:29 AM
Created 10-23-2014 01:41 AM
Thank you for the reply but it does not answer my question.
Spark is just a substitute for Storm. And I'm sure it is even better than Storm but my question was about Cassandra mostly.
The precomputed views from the batch and speed layer need to be stored somewhere in order to be able to query them. Can Cassandra store the batch views from Hadoop and how?
Created 10-23-2014 01:36 PM
Have you explored either Impala or Apache HBase for this use case?
Created 10-24-2014 01:49 AM
Yes, Impala is an SQL query engine so in my case it could substitute Presto (although Impala supports less SQL commands than Presto).
Can I query Cassandra using Impala? From what I've read until now, it seems like not.
HBase is an alternative for Cassandra, which would fit my case if it is scalable enough. I'm not sure about that. And will it be possible to store in HBase both batch and realtime views?
Created on 10-24-2014 09:43 AM - edited 10-24-2014 09:43 AM
Regarding Impala vs Presto: As of 2.0, Impala supports a wide range of SQL operations (docs). So I would not automatically make that assumption.
And no, Impala does not query Cassandra -- which leads to the question, what is it about your use case that implies Cassandra over HBase, which offers very similar capabilities but also things like strong consistency, coprocessors, and the availability of a nice GUI (Hue), if you're into that?
Created 10-29-2014 03:44 AM
What makes me consider Cassandra is the fact that if I can store batch and realtime views in Cassandra, the Lambda Architecture can be simplified. Only one storage for both types of views.
I'm not sure if i can do that with HBase too. I need to store both batch and real-time views and then query this views using SQL. Can I do that with HBase + Impala?
Created 10-29-2014 10:04 AM