Support Questions

ekbrosch · ‎08-08-2016

Has a consensus formed around the best tool stack for implementing the Lambda Architecture on HDP? I'm particularly interested in the "serving" and "speed" layers. In "Big Data: Principles and best practices of scalable real-time data systems", Nathan Marz mentions using ElephantDB for the serving layer, but I'm trying to limit myself to tools included in the HDP/HDF stacks.

egarelnabi · ‎08-08-2016

@Eric Brosch

Your tooling selection really all depends on your particular use case.

For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.

For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.

For your ingest and simple event processing you can look into HDF/Nifi.

If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.

Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.

View solution in original post

sunile_manjee · ‎08-08-2016

@Eric Brosch Phoenix and HDB (HAWQ) may be leveraged in a Lambda architecture. Phoenix supports secondary index and HAWQ is a relational MPP db on HDP. Both can serve low latency queries. Choosing between the two? For know query patterns Phoenix will perform well. For unknown query patterns HAWQ may be your way to go

ekbrosch · ‎08-08-2016

Thank you, @Eyad Garelnabi and @Sunile Manjee . The database portion is where my primary concerns were.

Eyad, are you layering Phoenix on top of HBase for querying?

sunile_manjee · ‎08-08-2016

@Eric Brosch Phoenix is a SQL skin on top of hbase. Phoenix allows to create secondary index on hbase which hbase natively does not create. Phoenix on HDP comes out of the box with hbase.

egarelnabi · ‎08-08-2016

@Eric Brosch

Your tooling selection really all depends on your particular use case.

For "Speed" layer, you can use Storm or Spark Streaming. IMHO the main selection criteria between the two will depend on whether you're interested in ultra low latency (Storm) or high throughput (Spark Streaming). There's other factors, but these are some of the main drivers.

For the "Serving" layer, your main choice is HBase. Depending on how you're going to query the "Serving" layer you may want to consider putting Phoenix on top of HBase. Since HBase is a NoSQL store, it has it's own API for making calls. Phoenix adds an abstraction layer on top of HBase and allows you to make queries in SQL format. Mind you, it's still in tech preview and may have some bugs here and there. Also, it's not meant for complex SQL queries.

For your ingest and simple event processing you can look into HDF/Nifi.

If you move beyond the HDP/HDF stack for the serving layer then your options increase to include other NoSQL stores as well as regular SQL DBs.

Below is a diagram of a sample Lambda architecture for a demo that receives sensor data from trucks and analysis them, along with driver behaviour, to determine the possibility of a driver committing a traffic violation/infraction. It will give you a better idea of what a lambda deployment may look like.

milindmore · ‎11-09-2017

HAWQ is good for nothing

Cloudera Community

Support Questions

Hortonworks Tools for Implementing Lambda Architecture

Lambda Architecture (Nathan Marz)

CDE Triggered By AWS Lambda

Understanding Solr Architecture and Best practices

LLAP - a one-page architecture overview

Datagen - Data Generator tool built for CDP

HBase Disaster Recovery Architecture Examples

Implementing a real-time Hive Streaming example

Maui: An Alternative Architecture for Prometheus

Hortonworks Refernece Architecture on AWS private ...

Zeppelin Architecture and Operational Workflow