question Re: Hortonworks Tools for Implementing Lambda Architecture in Archives of Support Questions (Read Only)

Hortonworks Tools for Implementing Lambda Architecture

ekbrosch — Mon, 08 Aug 2016 20:46:44 GMT

Has a consensus formed around the best tool stack for implementing the Lambda Architecture on HDP? I'm particularly interested in the "serving" and "speed" layers. In "Big Data: Principles and best practices of scalable real-time data systems", Nathan Marz mentions using ElephantDB for the serving layer, but I'm trying to limit myself to tools included in the HDP/HDF stacks.

Re: Hortonworks Tools for Implementing Lambda Architecture

sunile_manjee — Mon, 08 Aug 2016 21:33:10 GMT

@Eric Brosch Phoenix and HDB (HAWQ) may be leveraged in a Lambda architecture. Phoenix supports secondary index and HAWQ is a relational MPP db on HDP. Both can serve low latency queries. Choosing between the two? For know query patterns Phoenix will perform well. For unknown query patterns HAWQ may be your way to go

Re: Hortonworks Tools for Implementing Lambda Architecture

egarelnabi — Sun, 18 Aug 2019 10:55:36 GMT

@Eric Brosch

Your tooling selection really all depends on your particular use case.

For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.

For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.

For your ingest and simple event processing you can look into HDF/Nifi.

If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.

Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.

Re: Hortonworks Tools for Implementing Lambda Architecture

ekbrosch — Mon, 08 Aug 2016 23:05:26 GMT

Thank you, @Eyad Garelnabi and @Sunile Manjee . The database portion is where my primary concerns were.

Eyad, are you layering Phoenix on top of HBase for querying?

Re: Hortonworks Tools for Implementing Lambda Architecture

sunile_manjee — Mon, 08 Aug 2016 23:11:07 GMT

@Eric Brosch Phoenix is a SQL skin on top of hbase. Phoenix allows to create secondary index on hbase which hbase natively does not create. Phoenix on HDP comes out of the box with hbase.

Re: Hortonworks Tools for Implementing Lambda Architecture

milindmore — Fri, 10 Nov 2017 03:19:59 GMT

HAWQ is good for nothing