- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Lambda Architecture (Nathan Marz)
Created on ‎10-23-2014 01:25 AM - edited ‎09-16-2022 02:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am reading a lot lately about the Lambda Architecture paradigm from Nathan Marz.
Since CDH is perfect for the Batch Layer of such an architecture I was thinkning if it may be possible to save the precomputed views from Hadoop into Cassandra. How is it going to work?
The idea I'm evaluating is: have Hadoop for the batch layer, Storm for the speed layer and save the batch views and the real-time views in Cassandra. Then, the client can query Cassandra directly(for example using Facebook's Presto). I want to know if something like this could be possible.
Created ‎10-29-2014 10:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could also evaluate Apache Phoenix as another SQL-over-HBase option
(not currently supported by Cloudera though).
Created ‎10-23-2014 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
http://blog.cloudera.com/blog/2014/08/building-lambda-architecture-with-spark-streaming/
Gautam Gopalakrishnan
Created ‎10-23-2014 01:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply but it does not answer my question.
Spark is just a substitute for Storm. And I'm sure it is even better than Storm but my question was about Cassandra mostly.
The precomputed views from the batch and speed layer need to be stored somewhere in order to be able to query them. Can Cassandra store the batch views from Hadoop and how?
Created ‎10-23-2014 01:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you explored either Impala or Apache HBase for this use case?
Created ‎10-24-2014 01:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, Impala is an SQL query engine so in my case it could substitute Presto (although Impala supports less SQL commands than Presto).
Can I query Cassandra using Impala? From what I've read until now, it seems like not.
HBase is an alternative for Cassandra, which would fit my case if it is scalable enough. I'm not sure about that. And will it be possible to store in HBase both batch and realtime views?
Created on ‎10-24-2014 09:43 AM - edited ‎10-24-2014 09:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding Impala vs Presto: As of 2.0, Impala supports a wide range of SQL operations (docs). So I would not automatically make that assumption.
And no, Impala does not query Cassandra -- which leads to the question, what is it about your use case that implies Cassandra over HBase, which offers very similar capabilities but also things like strong consistency, coprocessors, and the availability of a nice GUI (Hue), if you're into that?
Created ‎10-29-2014 03:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What makes me consider Cassandra is the fact that if I can store batch and realtime views in Cassandra, the Lambda Architecture can be simplified. Only one storage for both types of views.
I'm not sure if i can do that with HBase too. I need to store both batch and real-time views and then query this views using SQL. Can I do that with HBase + Impala?
Created ‎10-29-2014 10:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could also evaluate Apache Phoenix as another SQL-over-HBase option
(not currently supported by Cloudera though).
