Support Questions

Find answers, ask questions, and share your expertise

Hortonworks Tools for Implementing Lambda Architecture

avatar
Contributor

Has a consensus formed around the best tool stack for implementing the Lambda Architecture on HDP? I'm particularly interested in the "serving" and "speed" layers. In "Big Data: Principles and best practices of scalable real-time data systems", Nathan Marz mentions using ElephantDB for the serving layer, but I'm trying to limit myself to tools included in the HDP/HDF stacks.

1 ACCEPTED SOLUTION

avatar

@Eric Brosch

Your tooling selection really all depends on your particular use case.

For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.

For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.

For your ingest and simple event processing you can look into HDF/Nifi.

If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.

Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.

6463-lambda.jpg

View solution in original post

5 REPLIES 5

avatar
Master Guru

@Eric Brosch Phoenix and HDB (HAWQ) may be leveraged in a Lambda architecture. Phoenix supports secondary index and HAWQ is a relational MPP db on HDP. Both can serve low latency queries. Choosing between the two? For know query patterns Phoenix will perform well. For unknown query patterns HAWQ may be your way to go

avatar
Contributor

Thank you, @Eyad Garelnabi and @Sunile Manjee . The database portion is where my primary concerns were.

Eyad, are you layering Phoenix on top of HBase for querying?

avatar
Master Guru

@Eric Brosch Phoenix is a SQL skin on top of hbase. Phoenix allows to create secondary index on hbase which hbase natively does not create. Phoenix on HDP comes out of the box with hbase.

avatar

@Eric Brosch

Your tooling selection really all depends on your particular use case.

For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.

For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.

For your ingest and simple event processing you can look into HDF/Nifi.

If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.

Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.

6463-lambda.jpg

avatar
Contributor

HAWQ is good for nothing