Support Questions

Find answers, ask questions, and share your expertise

Lambda Architecture (Nathan Marz)

avatar
Explorer

I am reading a lot lately about the Lambda Architecture paradigm from Nathan Marz.

Since CDH is perfect for the Batch Layer of such an architecture I was thinkning if it may be possible to save the precomputed views from Hadoop into Cassandra. How is it going to work?

The idea I'm evaluating is:  have Hadoop for the batch layer, Storm for the speed layer and save the batch views and the real-time views in Cassandra. Then, the client can query Cassandra directly(for example using Facebook's Presto).  I want to know if something like this could be possible.

1 ACCEPTED SOLUTION

avatar
Master Collaborator
You can, yes. I encourage you to ask detailed questions in the HBase area.

You could also evaluate Apache Phoenix as another SQL-over-HBase option
(not currently supported by Cloudera though).

View solution in original post

7 REPLIES 7

avatar
Here is an example of implementing Lambda architecture using Spark on CDH
http://blog.cloudera.com/blog/2014/08/building-lambda-architecture-with-spark-streaming/


Regards,
Gautam Gopalakrishnan

avatar
Explorer

Thank you for the reply but it does not answer my question.

 

Spark is just a substitute for Storm. And I'm sure it is even better than Storm but my question was about Cassandra mostly.

 

The precomputed views from the batch and speed layer need to be stored somewhere in order to be able to query them. Can Cassandra store the batch views from Hadoop and how?

avatar
Master Collaborator

Have you explored either Impala or Apache HBase for this use case?

avatar
Explorer

Yes, Impala is an SQL query engine so in my case it could substitute Presto (although Impala supports less SQL commands than Presto).

Can I query Cassandra using Impala? From what I've read until now, it seems like not.

 

HBase is an alternative for Cassandra, which would fit my case if it is scalable enough. I'm not sure about that. And will it be possible to store in HBase both batch and realtime views?

avatar
Master Collaborator

Regarding Impala vs Presto: As of 2.0, Impala supports a wide range of SQL operations (docs). So I would not automatically make that assumption.

 

And no, Impala does not query Cassandra -- which leads to the question, what is it about your use case that implies Cassandra over HBase, which offers very similar capabilities but also things like strong consistency, coprocessors, and the availability of a nice GUI (Hue), if you're into that?

avatar
Explorer

What makes me consider Cassandra  is the fact that if I can store batch and realtime views in Cassandra,  the Lambda Architecture can be simplified. Only one storage for both types of views.

 

I'm not sure if i can do that with HBase too. I need to store both batch and real-time views and then query this views using SQL. Can I do that with HBase + Impala?

avatar
Master Collaborator
You can, yes. I encourage you to ask detailed questions in the HBase area.

You could also evaluate Apache Phoenix as another SQL-over-HBase option
(not currently supported by Cloudera though).