Created 08-08-2016 01:46 PM
Has a consensus formed around the best tool stack for implementing the Lambda Architecture on HDP? I'm particularly interested in the "serving" and "speed" layers. In "Big Data: Principles and best practices of scalable real-time data systems", Nathan Marz mentions using ElephantDB for the serving layer, but I'm trying to limit myself to tools included in the HDP/HDF stacks.
Created on 08-08-2016 03:22 PM - edited 08-18-2019 03:55 AM
Your tooling selection really all depends on your particular use case.
For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.
For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.
For your ingest and simple event processing you can look into HDF/Nifi.
If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.
Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.
Created 08-08-2016 02:33 PM
@Eric Brosch Phoenix and HDB (HAWQ) may be leveraged in a Lambda architecture. Phoenix supports secondary index and HAWQ is a relational MPP db on HDP. Both can serve low latency queries. Choosing between the two? For know query patterns Phoenix will perform well. For unknown query patterns HAWQ may be your way to go
Created 08-08-2016 04:05 PM
Thank you, @Eyad Garelnabi and @Sunile Manjee . The database portion is where my primary concerns were.
Eyad, are you layering Phoenix on top of HBase for querying?
Created 08-08-2016 04:11 PM
@Eric Brosch Phoenix is a SQL skin on top of hbase. Phoenix allows to create secondary index on hbase which hbase natively does not create. Phoenix on HDP comes out of the box with hbase.
Created on 08-08-2016 03:22 PM - edited 08-18-2019 03:55 AM
Your tooling selection really all depends on your particular use case.
For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.
For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.
For your ingest and simple event processing you can look into HDF/Nifi.
If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.
Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.
Created 11-09-2017 07:19 PM
HAWQ is good for nothing